Hello,
I am looking for a way to download archive files, for example: https://forum.allporncomix.com/threads/ ... aises.607/ from all pages in the format /attachments/.. I have done several tests but either I download nothing or the search goes too far. Can you help me?
forum.allporncomix - archive files
-
- Site Admin
- Posts: 2431
- Joined: 02 Mar 2009, 17:02
Re: forum.allporncomix - archive files
Well, it's not that difficult. In cases like this, it's easier to exclude all addresses and then allow the software to crawl only those you need. And you need only 2 types of addresses for your projects:
1. Thread pages
2. Attachments
So, you can create a project with the following settings:
[Target files]:
*.zip
*.rar
(add other archive extensions if needed)
[ Site exploration ] / [ Regular site ]: Entire site
[ Filters ] / [ Excluded URLs ]:
.
[ Filters ] / [ Included URLs ]:
/attachments/
/threads/various-artists-3d-comics-traductions-francaises\.607/page-\d+$
That's it. That's all the settings. If you want to create more generic settings for the template, use a more generic expression for the last filter in the
[ Filters ] / [ Included URLs ]:
/threads/[^/]+/page-\d+$
1. Thread pages
2. Attachments
So, you can create a project with the following settings:
[Target files]:
*.zip
*.rar
(add other archive extensions if needed)
[ Site exploration ] / [ Regular site ]: Entire site
[ Filters ] / [ Excluded URLs ]:
.
[ Filters ] / [ Included URLs ]:
/attachments/
/threads/various-artists-3d-comics-traductions-francaises\.607/page-\d+$
That's it. That's all the settings. If you want to create more generic settings for the template, use a more generic expression for the last filter in the
[ Filters ] / [ Included URLs ]:
/threads/[^/]+/page-\d+$
-
- Posts: 3
- Joined: 23 Jul 2024, 10:43
Re: forum.allporncomix - archive files
Thank you!! I'm trying to understand and it helps me a lot. However, if I use:
[ Filters ] / [ Included URLs ]:
/attachments/
/threads/[^/]+/page-\d+$
The occurrence "/page-\d+$" allows scanning all pages but skips the first one which is not named page-1.
Did I forget something?
[ Filters ] / [ Included URLs ]:
/attachments/
/threads/[^/]+/page-\d+$
The occurrence "/page-\d+$" allows scanning all pages but skips the first one which is not named page-1.
Did I forget something?
-
- Site Admin
- Posts: 2431
- Joined: 02 Mar 2009, 17:02
Re: forum.allporncomix - archive files
The first page of the thread is supposed to be your Starting address if you use these settings. And Starting addresses are always scanned.
-
- Posts: 3
- Joined: 23 Jul 2024, 10:43
Re: forum.allporncomix - archive files
Everything is ok. Thank you very much, I understand this program a little bit better :)