Hi
I recently found a small(but in some sense a big one) Save - Conflicts problems in "auto" download
If i use "Do not download and save new file if file size is the same (if file size is different ...) "
case1 : At least two files have the same name on targetpage but "if file size is different" check happen only once with original file name like a.jpg
case2 : If the first downloaded file is broken, of course it goes "file size is different" situation. making a-001 (not broken same file)
THEN result : second same name(a-001) file continues to download repeatedly. like a-001, a-002, a-003 , a-004~
They are not recognized as duplicates because tools do not conflicts check on -001,-002 digits file even if they are the same size.
So it would be perfect if I could multiple conflicts check with a-001,a-002
not just a.jpg
Thank you
[suggestion] Save - Conflicts improvement
-
- Site Admin
- Posts: 2084
- Joined: 02 Mar 2009, 17:02
Re: [suggestion] Save - Conflicts improvement
That's right. All options in "Conflicts" only applied if file name is the same as already downloaded one. Otherwise there is no "file name conflict".case1 : At least two files have the same name on target page but "if file size is different" check happen only once with original file name like a.jpg
But isn't it great?! You get the "unbroken" file in the end!case2 : If the first downloaded file is broken, of course it goes "file size is different" situation. making a-001 (not broken same file)
I think the problem can be solved using filters. It looks like you have a page with a lot of repeated files, but I'm not sure. Can you give me actual URL and tell me what you need to download?
-
- Posts: 16
- Joined: 03 Aug 2017, 20:10
Re: [suggestion] Save - Conflicts improvement
My epr files.
http://sendanywhe.re/7TR4BZVU - epr1: pages updated daily
http://sendanywhe.re/G4RE4FIV - epr2: pages updated daily + Posts that continue to add images
And I'm sorry. I didn't say that I always use restart option with task scheduler. (I don't use update for some reasons)
the problem with case 1,2 is that I have to restart the pages that are updated daily every day. So duplicates continue to pile up.
For example, if a broken file happen in the middle of the epr2 task, the file is repeatedly downloaded daily, e.g. a-002,a-003.
http://gall.dcinside.com/board/view/?id ... no=7845048
test page with same name files.
Thank you
http://sendanywhe.re/7TR4BZVU - epr1: pages updated daily
http://sendanywhe.re/G4RE4FIV - epr2: pages updated daily + Posts that continue to add images
And I'm sorry. I didn't say that I always use restart option with task scheduler. (I don't use update for some reasons)
the problem with case 1,2 is that I have to restart the pages that are updated daily every day. So duplicates continue to pile up.
For example, if a broken file happen in the middle of the epr2 task, the file is repeatedly downloaded daily, e.g. a-002,a-003.
http://gall.dcinside.com/board/view/?id ... no=7845048
test page with same name files.
Thank you
-
- Site Admin
- Posts: 2084
- Joined: 02 Mar 2009, 17:02
Re: [suggestion] Save - Conflicts improvement
How about this one: use advanced file naming which will include parts of URLs along with "overwrite duplicates" option? You WILL lose original file names in this case, but it will certainly help with your problem because 2 different files cannot have the same URL. Like in the example that you gave me, you could use the value of "no" parameter to create file name and prevent duplicate file names completely. You can also use regular expressions that will be matched against file URL or parent URL to create file names.
-
- Posts: 16
- Joined: 03 Aug 2017, 20:10
Re: [suggestion] Save - Conflicts improvement
That's good idea for future, but almost 334GB already downloaded.
no way to check the digit duplicates?
no way to check the digit duplicates?
-
- Site Admin
- Posts: 2084
- Joined: 02 Mar 2009, 17:02
Re: [suggestion] Save - Conflicts improvement
Nope, sorry :(
-
- Posts: 16
- Joined: 03 Aug 2017, 20:10
Re: [suggestion] Save - Conflicts improvement
Never mind. Maybe I can run the deduplication program for it.
Thank you.
Thank you.