-
shave1
- Posts: 5
- Joined: 13 Nov 2017, 06:34
Post
by shave1 » 13 Nov 2017, 07:04
I saw an older question in 2016 on how to scrape 500px.com, but as of 2017, the site layout is a bit different now. I'd like to know how to scrape it for high res jpgs at this point.
- I'd like to auto create a subdirectory for each photographer, e.g. /alyabev/, /sergeybondarev/, etc.
- In this case, the high res jpg should be named "234377907.jpg" from the above page. So the resulting file will be in "/sergeybondarev/234377907.jpg".
- The website uses AJAX, so using a browser, only the first 50 photos are loaded until you scroll toward the bottom. Of course, I am hoping I can scrape all photos on a page rather than the first 50 only.
Can someone help with the projects setup? Thanks!
-
shave1
- Posts: 5
- Joined: 13 Nov 2017, 06:34
Post
by shave1 » 18 Nov 2017, 07:26
Very nice!
Just a minor issue, why are some files downloaded multiple times, with 001, 002 endings?
Such as
https://500px.com/frbr.
-
Maksym
- Site Admin
- Posts: 2077
- Joined: 02 Mar 2009, 17:02
Post
by Maksym » 20 Nov 2017, 12:52
You can modify the template and select "Do not download and save new file if file size is the same" option in the [Save -> Conflicts] section. It should help.