Page 1 of 1

I do not want to download the downloaded file again.

Posted: 05 Nov 2017, 20:03
by wcolor
The title is as follows.


I want EPF to memorize it once downloaded.
In order to avoid re-downloading, it seems necessary to keep the file without erasing it.

Also, even if I keep image files, it is barren to check the same album URL each time. Periodically checking the page with the update is still necessary. However, on sites like simple album hosting, EPF should avoid the update check by revisit and speed up to reduce the burden on the site side.

Re: I do not want to download the downloaded file again.

Posted: 06 Nov 2017, 12:40
by Maksym
Can I have sample URLs please for the tests?

Re: I do not want to download the downloaded file again.

Posted: 06 Nov 2017, 15:46
by wcolor
Maxim wrote:
06 Nov 2017, 12:40
Can I have sample URLs please for the tests?
It is for all URLs.
Intuitively, I expect that the download date and time and the URL will be saved in persistent storage such as SQLite and inquired about it and the download will be started only when no record is found.

Re: I do not want to download the downloaded file again.

Posted: 06 Nov 2017, 17:04
by Maksym
Well, if you use the "Resume" or "Update" option to start your projects - EPF should not download the same files again. Only if you use "Restart" option.

But for some video / image hosting websites, where URLs for actual video/image files are re-generated every time and they are different every time - it won't work. EPF will re-download such files over and over again every time you re-start the project using any option.

That's why I asked for sample URLs to find out if this is the case.

Re: I do not want to download the downloaded file again.

Posted: 07 Nov 2017, 17:34
by wcolor
Maxim wrote:
06 Nov 2017, 17:04
Well, if you use the "Resume" or "Update" option to start your projects - EPF should not download the same files again. Only if you use "Restart" option.
Wow, It is very nice behavior.
I fear re-downloading, so I will back up the entire EPF configuration frequently.
For example, what is the behavior of EPF when it is a website to be paged and the latest information is displayed on the first page?
My ideal idea is to crawl pages, but images that have already been downloaded are ignored.
The important thing here is not that images do not overlap when files are present in the save destination after downloading.
It is not to download the image that you really downloaded once. It should check the image URL and Content-Length. (It may be just a URL)

Re: I do not want to download the downloaded file again.

Posted: 07 Nov 2017, 17:44
by Maksym
If you need to re-crawl pages to find newly added content without re-downloading already saved files - use "Update" option. It works exactly as you described.