list of template and regex functions

Post Reply
hecramsey
Posts: 6
Joined: 01 Jan 2023, 02:25

list of template and regex functions

Post by hecramsey » 01 Jan 2023, 02:31

so I need to do a specific thing in template, and I find forum answers point to x template or y template. is there an index of generic functions anywhere? IE "click a button" or "handle 404 error"? vs "how to download from a website"

Maxim
Site Admin
Posts: 1732
Joined: 02 Mar 2009, 17:02

Re: list of template and regex functions

Post by Maxim » 01 Jan 2023, 14:03

There is a help file which comes with a program and you can press "F1" anywhere in the program to open it. It has a detailed description of Custom Parsers along with all functions - simply press "F1" in the project properties window when the [ Site exploration - Custom Parsers ] section is selected.

But I think you are looking for something more specific. So, let's get more specific - give me your questions with URL examples, and I'll try to give you a complete answer, explaining how to emulate button clicks with Custom Parsers and overcome "404" errors (which is not always possible).

Here is one post that you may find useful:

Limiting Extreme Picture Finder exploration

hecramsey
Posts: 6
Joined: 01 Jan 2023, 02:25

Re: list of template and regex functions

Post by hecramsey » 07 Jan 2023, 01:01

thanks maxim. I need to just click a button like the "choose file to upload" button at this link
https://file.al/

I know how to code it in all sorts of languages, not sure how to do it in this app. or with RegEx

Maxim
Site Admin
Posts: 1732
Joined: 02 Mar 2009, 17:02

Re: list of template and regex functions

Post by Maxim » 07 Jan 2023, 12:35

1. You cannot upload files with Extreme Picture Finder. Only download. Plus, there is a CAPTCHA there, so you can only use [ Manual login ] to enter your user name/password in the built-in browser.

2. How do you beat CAPTCHA in any language? Please tell me.

3. What is the actual end result you want achieve with Extreme Picture Finder on this website? If you can tell me this and this is something Extreme Picture Finder can do - I'll give you a detailed explanation, like this one.

hecramsey
Posts: 6
Joined: 01 Jan 2023, 02:25

Re: list of template and regex functions

Post by hecramsey » 08 Jan 2023, 02:11

EDIT 2 I got it. I see now how to generate the string with variables. Problem solved THANKS

EDIT -- am I overcomplicating this thinking I need to code in regex?

I'm sorry I think I wasted your time with poorly selected example. I have written scrapers in py, perl, .net but it seems this app uses regex to generate the GETS, is that correct? I'm a friggen data lake dev, I don't know how to use things that are user friendly. THANKS

So I looked at XHAMSTER template, the INCLUDED URL has this regex.
^[#1](www\.)?[#3]\.[#5]/photos/gallery/[^/\?&#]+-\d+(/\d+)?$
I am guessing this means go to every url that matches this pattern?

so the button I am clicking has this redirect: https://www.somewebsite.net/download.php?id=12345
I AM terrible with regex so in pseudo code the regex should be

include every thing that looks like "https://www.somewebsite.net/download.php?id=" + numbers from 1 to 10000,
IE
"https://www.somewebsite.net/download.php?id=1
"https://www.somewebsite.net/download.php?id=2
"https://www.somewebsite.net/download.php?id=3 etc etc

This general idea? I use the Included/Excluded to pattern match the redirects from the button.

Maxim
Site Admin
Posts: 1732
Joined: 02 Mar 2009, 17:02

Re: list of template and regex functions

Post by Maxim » 08 Jan 2023, 13:30

It looks like you're missing the entire concept of how Extreme Picture Finder works. The main point of it - it can find all the necessary links automatically, without the need for user to specify a Regular Expression for every URL he needs. It can crawl the website or a directory in a website parsing the HTML source code of the pages, looking for your [ Target Files ] downloading and saving them completely automatically. Please read this post where I explain the basics of Extreme Picture Finder exploration: Limiting Extreme Picture Finder exploration.

Thus, you only need Regular Expressions in the [ Excluded URLs ] section to prevent Extreme Picture Finder from crawling certain URLs that are located within the main exploration area.

You only need Regular Expressions in the [ Included URLs ] section to make Extreme Picture Finder crawl additional URLs that are located outside the main exploration area (and for those that were excluded by the wider Reg Ex in the [ Excluded URLs ] section). For example, pages on external image hosting websites.

You need [ Excluded Page Parts ] to make Extreme Picture Finder ignore certain parts of the HTML source code of the pages.

You need [ Custom Parsers ] to generate addtional URLs that are not directly present in the HTML source code of the pages crawled by Extreme Picture Finder from certain parts of the HTML source code of those pages and for generating POST requests.

So, all 4 of the above are advanced techniques that are not required for every project. Most projects can run on default settings and deliver the results.

Now, when already have a list of URLs that point directly to the files you want to download - this is the easiest case and you need absolutely no Regular Expressions or additional settings. Just create a project and set the [ Exploration type ] to [ Current page only ] and select appropriate [ Target Files ]. That's it! Here is the tutorial:

Download URL list

So, if your "download.php?id=XXX" URLs give you the actual content you want or have direct links to your [ Target Files ] - just follow the above tutorial.

If not - please give me more information and I'll try to explain further. It certainly does look like you're trying to overcomplicate the things with Extreme Picture Finder. This software is for users, not only for developers. Well, some parts of it, at least ;-)

hecramsey
Posts: 6
Joined: 01 Jan 2023, 02:25

Re: list of template and regex functions

Post by hecramsey » 23 Jan 2023, 22:28

I'm good thanks so much. I see this is more subtractive -- without exclusions and filters it will go to any and every target. I'm curious about how it reads a URL from the text of a button, no need to answer, just curious thx. the way I built scraper was to write a regex to extract every matching URL, but my regex was good back then, haven't touched it in years so screw that. TAHNKS

Maxim
Site Admin
Posts: 1732
Joined: 02 Mar 2009, 17:02

Re: list of template and regex functions

Post by Maxim » 24 Jan 2023, 11:11

I'm good thanks so much. I see this is more subtractive -- without exclusions and filters it will go to any and every target.
... to any and every target within project download limits.
I'm curious about how it reads a URL from the text of a button
It just parses entire HTML source code of a page to find and extract all URLs it can find (links, images, buttons, forms, anything). And then the program decides which URLs are within the project limits to add them to the download queue. It doesn't really care about the origin of the URL (button or link - doesn't matter).

Post Reply