Page 1 of 1

Fine Art America - fineartamerica.com

Posted: 25 Aug 2024, 21:35
by jojo1000
Could an expert kindly create a script for this awesome website please?

Re: Fine Art America - fineartamerica.com

Posted: 26 Aug 2024, 12:29
by Maksym
Have you tried the built-in generic templates? If you want images of one product - use the "Download all images from a single-page gallery" template. If you want everything from this website - use the "Download all images from entire website" template. You can apply filters and limits to avoid downloading thumbnails and website design elements.

Re: Fine Art America - fineartamerica.com

Posted: 26 Aug 2024, 16:20
by jojo1000
Yes, thank you, I did try it. But it's downloading the lowest quality images available.

My goal is to download all images from let's say:
https://fineartamerica.com/art/photographs/roger+moore

The quality I am seeking for the software to download is:
https://fineartamerica.com/featured/spa ... chive.html
https://fineartamerica.com/featured/sha ... -ruck.html

How can I make this possible?

Re: Fine Art America - fineartamerica.com

Posted: 26 Aug 2024, 17:53
by Maksym
Well, in this case, you'll need to use advanced configuration options. It's not going to be easy if you don't know a little bit of HTML. You need to limit the exploration only to the pages that you need. So, first of all, make sure you have [ Entire website ] selected in the [ Regular site ] section of the project properties. Then exclude all the pages of the website by adding

^https?://(www\.)?fineartamerica\.com

to the [ Excluded URLs ]

And then you'll have to add filters to the [ Included URLs ] that will allow the software to crawl only the pages with the full-size photos. Looks like adding

/featured/

is going to be enough. This will already do the job, and you'll get the full-size images (those that are shown on the website, not those that are hidden behind the "preview". So, in order to get rid of the smaller-resolution images and website assets you can also add the following filters to the [ Excluded URLs ]:

/assets/
/400/

And now, for the final touches, you can open the HTML source of the pages that you need to crawl and "exclude" parts of those pages that do not contain the information that you do not need to the [ Excluded Page Parts ]. That's what I came up with:

Image

From1=
To1=<div class='leftdiv'

From2=<div class='rightdiv
To2=

From3=
To3=<div id='imageFlowContainerDiv'

From4=<div id='searchEngineFooterDiv'>
To4=

Re: Fine Art America - fineartamerica.com

Posted: 26 Aug 2024, 17:59
by jojo1000
Wow, your reply by itself seemed like a training session. Thank you so much.

I'll try all of this and respond to you.

Thanks again.