[suggestion] Save - Conflicts improvement

Post Reply
KAIAD
Posts: 16
Joined: 03 Aug 2017, 20:10

[suggestion] Save - Conflicts improvement

Post by KAIAD » 09 Dec 2018, 06:07

Hi

I recently found a small(but in some sense a big one) Save - Conflicts problems in "auto" download
If i use "Do not download and save new file if file size is the same (if file size is different ...) "

case1 : At least two files have the same name on targetpage but "if file size is different" check happen only once with original file name like a.jpg
case2 : If the first downloaded file is broken, of course it goes "file size is different" situation. making a-001 (not broken same file)

THEN result : second same name(a-001) file continues to download repeatedly. like a-001, a-002, a-003 , a-004~
They are not recognized as duplicates because tools do not conflicts check on -001,-002 digits file even if they are the same size.


So it would be perfect if I could multiple conflicts check with a-001,a-002
not just a.jpg

Thank you

Maksym
Site Admin
Posts: 2077
Joined: 02 Mar 2009, 17:02

Re: [suggestion] Save - Conflicts improvement

Post by Maksym » 10 Dec 2018, 13:22

case1 : At least two files have the same name on target page but "if file size is different" check happen only once with original file name like a.jpg
That's right. All options in "Conflicts" only applied if file name is the same as already downloaded one. Otherwise there is no "file name conflict".
case2 : If the first downloaded file is broken, of course it goes "file size is different" situation. making a-001 (not broken same file)
But isn't it great?! You get the "unbroken" file in the end!

I think the problem can be solved using filters. It looks like you have a page with a lot of repeated files, but I'm not sure. Can you give me actual URL and tell me what you need to download?

KAIAD
Posts: 16
Joined: 03 Aug 2017, 20:10

Re: [suggestion] Save - Conflicts improvement

Post by KAIAD » 11 Dec 2018, 03:50

My epr files.

http://sendanywhe.re/7TR4BZVU - epr1: pages updated daily
http://sendanywhe.re/G4RE4FIV - epr2: pages updated daily + Posts that continue to add images

And I'm sorry. I didn't say that I always use restart option with task scheduler. (I don't use update for some reasons)
the problem with case 1,2 is that I have to restart the pages that are updated daily every day. So duplicates continue to pile up.
For example, if a broken file happen in the middle of the epr2 task, the file is repeatedly downloaded daily, e.g. a-002,a-003.

http://gall.dcinside.com/board/view/?id ... no=7845048
test page with same name files.

Thank you

Maksym
Site Admin
Posts: 2077
Joined: 02 Mar 2009, 17:02

Re: [suggestion] Save - Conflicts improvement

Post by Maksym » 11 Dec 2018, 12:54

How about this one: use advanced file naming which will include parts of URLs along with "overwrite duplicates" option? You WILL lose original file names in this case, but it will certainly help with your problem because 2 different files cannot have the same URL. Like in the example that you gave me, you could use the value of "no" parameter to create file name and prevent duplicate file names completely. You can also use regular expressions that will be matched against file URL or parent URL to create file names.

KAIAD
Posts: 16
Joined: 03 Aug 2017, 20:10

Re: [suggestion] Save - Conflicts improvement

Post by KAIAD » 11 Dec 2018, 14:02

That's good idea for future, but almost 334GB already downloaded.

no way to check the digit duplicates?

Maksym
Site Admin
Posts: 2077
Joined: 02 Mar 2009, 17:02

Re: [suggestion] Save - Conflicts improvement

Post by Maksym » 11 Dec 2018, 14:05

Nope, sorry :(

KAIAD
Posts: 16
Joined: 03 Aug 2017, 20:10

Re: [suggestion] Save - Conflicts improvement

Post by KAIAD » 11 Dec 2018, 14:17

Never mind. Maybe I can run the deduplication program for it.

Thank you.

Post Reply