Page 1 of 1

Help with template

Posted: 08 Jul 2019, 23:49
by redundant
Recently I found myself a mod in a forum and need to do a site rip of all the photos in order to get rid of the spammy posters but keep their content. The problem is they use several different image hosts that all have multiple redirects and popups and I have no idea how to get around them. One poster uses a site called celebnewsfind.com to "anonymize" his posts and that makes it even more difficult. The forum site is: http://celebgirls.fun/index.php They have a sister site at youngermodel.pw/index.php
Any help would be much appreciated but not expected. Thank you

Re: Help with template

Posted: 09 Jul 2019, 12:51
by Maksym
You can try using a generic template called "Download all images from entire website" and then add the following filters to the "Included URLs" to handle image hosts:

(?si)^https?://img\d+\.imagevenue\.com/img\.php\?
(?si)^https?://img\d+\.imagevenue\.com/[^/]+/loc\d+/
(?si)^https?://(www\.)?[^/\.\?]+\.[a-z]+/images?/[^/\?#]+$
(?si)^[^\?]+\.imagebam\.com/.+?\?download=1
(?si)^https?://(www\.)?imagflash\.com/images/
(?si)^https?://(www\.)?[^/\.\?]+\.com/[^/\?]+/[^/\?]+\.[jpeginf]+(\.html)?$
(?si)^https?://i(mg)?\d+\.[^/\.\?]+\.com/i/
(?si)^https?://(www\.)?imgbox\.com/[^/\?#]+$
(?si)^https?://i(mages)?(\d+)?\.imgbox\.com/[^\?#]+$
(?si)^https?://(www\.)?[^/\.\?]+\.[a-z]+/show/[^&/\?#]+$
(?si)^https?://i\d+\.imgchili\.net/
(?si)^https?://(www\.)?pixhost\.org/show/[^&]+$
(?si)^https?://img\d+.pixhost\.org/images/
(?si)^https?://(www\.)?imageupper\.com/i/\?
(?si)^https?://s\d+\.imageupper\.com/\d+/
(?si)^https?://(www\.)?gogoimage\.org/img-
(?si)^https?://(www\.)?turboimagehost\.com/p/
turboimg\.net/sp/
\.imgswift\.com/files/\d+/
(?si)^https?://(www\.)share-image\.com/gallery/[^/]+/\d+$
(?si)^https?://(www\.)?mixbase\.net/gallery/image\.php\?id=\d+$
(?si)^https?://(www\.)?mixbase\.net/gallery/media/storage/
(?si)^https?://(www\.)?[^/]+/img-
/big/
abload\.de/img/
(?si)dpic\.me/[^/]+/[^/]+\.[jpginf]+$
(?si)^https?://(www\.)?greenpiccs\.com/.+?\.[jpegnif]+(\.html)?$
filesor\.com/pimpandhost\.com/.+?\.[jpegnif]+$
pixxxels\.cc/image/[^/]+/$
pixxxels\.cc/.+\?dl=1
pixhost\.to/show/[^\]#\?&]+$
img\d+\.pixhost\.to/images/[^\]#\?&]+$
^https?://(www\.)directupload\.net/file/[^\[]+$
directupload\.net/images/[^\[]+$
^[^\?]+/v\.php\?id=[^&]+$
/pic_b/
\.imx\.to/i/
^[^\?'"\]]+[postimgxel]+\.cc/[^/\?#\.]+$
^[^\?'"\]]+[postimgxel]+\.cc/[^/]+/[^/]+\?dl=1$

Re: Help with template

Posted: 12 Jul 2019, 19:00
by redundant
Thank you for helping. I tried your suggestion and it didn't work at first... i figured it was because of the anonymizer site the poster was using. So I added //celebnewsfind.com/to the include list and now it works... sorta. It's downloading the imagespice and imagetwist but I think it needs something for the other file hosts he uses... imagefrost, imageadult, imagedrive, imgbaron, kvador, imgwallet....etc...
. Should I just type them in as is? or does it need all those question marks, slashes, carrots and money signs?

Re: Help with template

Posted: 13 Jul 2019, 14:41
by Maksym
Those are not supported by Extreme Picture Finder right now. Those image hosts require you to click a button usually called something like "Continue to image..." which uses HTTP POST method to get to the actual image page.

As for the carrots and money signs - those are parts of Regular Expressions. You can learn them here (for example):

https://regexone.com/

Re: Help with template

Posted: 19 Jul 2019, 05:59
by redundant
the program reads the thumbnail file which is just the picture file with a _t added to the end.How would i tell it to just take the _t off the end. ill give example
/nkrik67f9mho_t.jpg is the end of the file it already scans and says is too small
all it needs to do is grab /nkrik67f9mho.jpg

another example is it scans /nkrik67f9mho/m160-s185-001.jpg.html
if it would drop the .html at the end it would get to a page witht the original picture on it and if it dropped the html and the whole end string it and put .jpg on the end itwould go to the source page

Re: Help with template

Posted: 19 Jul 2019, 11:55
by Maksym
You have to use Custom Parsers for that. If I had full URLs - I could provide more precise Regular Expressions, but based on your input here you are.

This one removes "_t":

Expression: (.*/[^/]+)_t\(\.[jpegnif]+)$
Result: [#1][#2]

This one removes the trailing ".html":

Expression: (.*\.[jpegnif]+)\.html$
Result: [#1]

But let me assure you that it's never that easy :( When you remove the trailing ".html" - you are automatically redirected back to it. Once you remove "_t" to generate direct full-size image URL - turns out it need to have a referer pointing to the page where the full-size image is shown. The image hosting websites are really trying to make people view their ADs, not just grab the images :(

But sometimes it does actually work. The most recent success was with the "imx.to" website. You can take a look at the Custom Parsers used for it in this template:

imx.to - gallery template