Hi there,
I'm looking to download all pictures from some threads on www.sxnarod.com. This is TGP website with www.backbook.me as an image hoster.
Example of thread I want to download pictures from, where one link on snxnarod.com can lead to a group of images on backbook.me:
https://www.sxnarod.com/victoria-s-secret-angels-t.html
Would appreciate your help with the template.
Thanks you.
snxnarod download
-
- Site Admin
- Posts: 2084
- Joined: 02 Mar 2009, 17:02
Re: snxnarod download
This is not a TGP website. This is a forum website. And the best way to download from a forum thread is "exclude everything - include only required URLs". So you create a project and set Exploration type to Regular. Then exclude all URLs by adding
.*
to the Excluded URLs filters. And then you start adding Regular Expressions for the required URLs to the Included URLs. In case of all forum threads you need to add an expression that will match any page of that thread. Usually thread page URLs differ only by number which can be replaced with \d+ in the Regular Expression of the filter. In this case it would be:
\.com/victoria-s-secret-angels_\d+-t\.html$
Now, this forum uses an image hosting website for images, so you need to add a regular expression that will match any image page from that hosting:
backbook\.me/photo-[^/\?]+\.html$
And finally you need to add the filter to match path to any full-size image on this hosting:
backbook\.me/full/
That's it.
.*
to the Excluded URLs filters. And then you start adding Regular Expressions for the required URLs to the Included URLs. In case of all forum threads you need to add an expression that will match any page of that thread. Usually thread page URLs differ only by number which can be replaced with \d+ in the Regular Expression of the filter. In this case it would be:
\.com/victoria-s-secret-angels_\d+-t\.html$
Now, this forum uses an image hosting website for images, so you need to add a regular expression that will match any image page from that hosting:
backbook\.me/photo-[^/\?]+\.html$
And finally you need to add the filter to match path to any full-size image on this hosting:
backbook\.me/full/
That's it.
-
- Posts: 3
- Joined: 01 Mar 2018, 03:23
Re: snxnarod download
It seems to start working now.
The problem is, I can only manage to download a fraction of all linked images.
In one thread it parses all the backbook.me links (1200), but only handful of them are dl.backbook.me/full (42), which resulted in 42 downloaded pics.
The problem is, I can only manage to download a fraction of all linked images.
In one thread it parses all the backbook.me links (1200), but only handful of them are dl.backbook.me/full (42), which resulted in 42 downloaded pics.
-
- Site Admin
- Posts: 2084
- Joined: 02 Mar 2009, 17:02
Re: snxnarod download
OK, have you seen where are the rest of them are located? Do you know how to do it?
-
- Posts: 3
- Joined: 01 Mar 2018, 03:23
Re: snxnarod download
One of the images that were not downloaded - https://www.backbook.me/photo-c193723503.html
This is the location - https://d.backbook.me/file/2017/11/23/8 ... 723503.jpg
I guess I need additional included url, but don't know how to form it myself.
Another question, regarding naming. Is it possible to rename files in format 'generated numerical file name_original file name', like 0000_c19372350 for example?
Thanks
This is the location - https://d.backbook.me/file/2017/11/23/8 ... 723503.jpg
I guess I need additional included url, but don't know how to form it myself.
Another question, regarding naming. Is it possible to rename files in format 'generated numerical file name_original file name', like 0000_c19372350 for example?
Thanks
-
- Site Admin
- Posts: 2084
- Joined: 02 Mar 2009, 17:02
Re: snxnarod download
In the Included URLs filter (as well as in the Excluded URLs) you have to add the common part of all similar URLs. In this case the common part of all full-size images hosted on backbook.me is the domain name and the text "full" as a part of the URL. So you can change that last filter to the following:One of the images that were not downloaded - https://www.backbook.me/photo-c193723503.html
This is the location - https://d.backbook.me/file/2017/11/23/8 ... 723503.jpg
I guess I need additional included url, but don't know how to form it myself.
backbook\.me([^\?]+)?/full
to include all the URLs.
Well, there is an Expert tab in the Naming section of the project properties where you can use Regular Expression of the file URL or any parent URL to create a file name, but current version of EPF does not allow combining number generation and original file name right now. I'm adding it to the to-do list for future versions.Another question, regarding naming. Is it possible to rename files in format 'generated numerical file name_original file name', like 0000_c19372350 for example?