Excluding subdirectory

Post Reply
kaosnews
Posts: 10
Joined: 21 Sep 2017, 15:38

Excluding subdirectory

Post by kaosnews » 25 Sep 2017, 16:45

I download pictures from a directory but want to exclude the sub-directory.
So it's basically this:
https://www.site.com/images/image01.jpg -> want to download this
https://www.site.com/images/thumbs/image01.jpg -> don't want to download this

In included URLs i have this:
https://www.site.com/images/[^#]+$

and excluded URLs i have this:
.*
https://www.site.com/images/thumbs/[^#]+$

But this doesn't work. Really stuck here, and possible the solution is to easy... ;)
Maybe it's possible (can't find it in Regular expression documentation) to download only files in root of directory and not sub-directories?

Maxim
Site Admin
Posts: 882
Joined: 02 Mar 2009, 17:02

Re: Excluding subdirectory

Post by Maxim » 25 Sep 2017, 17:52

If your files are on the same domain name as your starting URL, then you do not need to use "Included URLs" at all. You just need to exclude

/thumbs/

and that's it.

But if you really need to use both filters to get your files, then use one in the "Included URLs":

https://www.site.com/images/[^#/]+$
Maybe it's possible (can't find it in Regular expression documentation) to download only files in root of directory and not sub-directories?
Using Regular Expressions is the last resort (so to say). There are easier ways in Extreme Picture Finder. For example, you can limit the exploration depth, or use Exploration option "Current page only" or "Current directory and deeper"...

If you give me some real URLs and tell me which files do you need - I can try to help with the settings.

kaosnews
Posts: 10
Joined: 21 Sep 2017, 15:38

Re: Excluding subdirectory

Post by kaosnews » 26 Sep 2017, 13:14

Think your solution is working!

I wanted to download every image from: https://8ch.net/strek/catalog.html

- Starting addresses: https://8ch.net/strek/catalog.html
- Site exploration > Regular site: Current directory and deeper

- Save>Sub-folders>Use target file parent URL Regular Expression to create sub-folder
I set this in Regular Expression: https://8ch.net/(.+?)/res/([^&]+).html
Checked Case-insentitive (/i) and Single-line (/s) -> don't know if this is needed
.. and Results I added: [#2]

- Excluded URLs
.*
/thumb/

- Included URLs
https://8ch.net/(.+?)/res/([^&]+).html
https://media.8ch.net/file_store/[^#/]+$

Maxim
Site Admin
Posts: 882
Joined: 02 Mar 2009, 17:02

Re: Excluding subdirectory

Post by Maxim » 26 Sep 2017, 17:01

Good job! I'm glad you're learning! Here is another tip: if you use

.*

in Excluded URLs (or anywhere else), it means EVERYTHING. Literally. Point stands for [any character], asterisk stands for [0 or more], so Regular Expression of point and asterisk means "any number of any characters". Once you add this expression to filters it's pointless to add anything else.

You were right about "Current directory and deeper" exploration method for this project. And, basically, this is the only thing that you had to choose for this project. You could leave the rest of settings with their default values.

So, my settings would be:

Regular site -> Current directory and deeper (Download from external sites checked)
Target files: *.jp*, *.png, *.gif

That's it! Default project settings already have the

(?si)thumb

filter which will exclude all thumbnails. Thus, just as I told you before, Regular Expressions are supposed to be used only if default projects settings fail.

Post Reply