Fine Art America - fineartamerica.com
-
- Posts: 3
- Joined: 25 Aug 2024, 20:59
Fine Art America - fineartamerica.com
Could an expert kindly create a script for this awesome website please?
-
- Site Admin
- Posts: 2230
- Joined: 02 Mar 2009, 17:02
Re: Fine Art America - fineartamerica.com
Have you tried the built-in generic templates? If you want images of one product - use the "Download all images from a single-page gallery" template. If you want everything from this website - use the "Download all images from entire website" template. You can apply filters and limits to avoid downloading thumbnails and website design elements.
-
- Posts: 3
- Joined: 25 Aug 2024, 20:59
Re: Fine Art America - fineartamerica.com
Yes, thank you, I did try it. But it's downloading the lowest quality images available.
My goal is to download all images from let's say:
https://fineartamerica.com/art/photographs/roger+moore
The quality I am seeking for the software to download is:
https://fineartamerica.com/featured/spa ... chive.html
https://fineartamerica.com/featured/sha ... -ruck.html
How can I make this possible?
My goal is to download all images from let's say:
https://fineartamerica.com/art/photographs/roger+moore
The quality I am seeking for the software to download is:
https://fineartamerica.com/featured/spa ... chive.html
https://fineartamerica.com/featured/sha ... -ruck.html
How can I make this possible?
-
- Site Admin
- Posts: 2230
- Joined: 02 Mar 2009, 17:02
Re: Fine Art America - fineartamerica.com
Well, in this case, you'll need to use advanced configuration options. It's not going to be easy if you don't know a little bit of HTML. You need to limit the exploration only to the pages that you need. So, first of all, make sure you have [ Entire website ] selected in the [ Regular site ] section of the project properties. Then exclude all the pages of the website by adding
^https?://(www\.)?fineartamerica\.com
to the [ Excluded URLs ]
And then you'll have to add filters to the [ Included URLs ] that will allow the software to crawl only the pages with the full-size photos. Looks like adding
/featured/
is going to be enough. This will already do the job, and you'll get the full-size images (those that are shown on the website, not those that are hidden behind the "preview". So, in order to get rid of the smaller-resolution images and website assets you can also add the following filters to the [ Excluded URLs ]:
/assets/
/400/
And now, for the final touches, you can open the HTML source of the pages that you need to crawl and "exclude" parts of those pages that do not contain the information that you do not need to the [ Excluded Page Parts ]. That's what I came up with:
From1=
To1=<div class='leftdiv'
From2=<div class='rightdiv
To2=
From3=
To3=<div id='imageFlowContainerDiv'
From4=<div id='searchEngineFooterDiv'>
To4=
^https?://(www\.)?fineartamerica\.com
to the [ Excluded URLs ]
And then you'll have to add filters to the [ Included URLs ] that will allow the software to crawl only the pages with the full-size photos. Looks like adding
/featured/
is going to be enough. This will already do the job, and you'll get the full-size images (those that are shown on the website, not those that are hidden behind the "preview". So, in order to get rid of the smaller-resolution images and website assets you can also add the following filters to the [ Excluded URLs ]:
/assets/
/400/
And now, for the final touches, you can open the HTML source of the pages that you need to crawl and "exclude" parts of those pages that do not contain the information that you do not need to the [ Excluded Page Parts ]. That's what I came up with:
From1=
To1=<div class='leftdiv'
From2=<div class='rightdiv
To2=
From3=
To3=<div id='imageFlowContainerDiv'
From4=<div id='searchEngineFooterDiv'>
To4=
-
- Posts: 3
- Joined: 25 Aug 2024, 20:59
Re: Fine Art America - fineartamerica.com
Wow, your reply by itself seemed like a training session. Thank you so much.
I'll try all of this and respond to you.
Thanks again.
I'll try all of this and respond to you.
Thanks again.