I'm a recent buyer and am having issues building a decent workflow with this on images from civitai.com. I'm hoping someone can spot a simple mistake that i'm making that will improve said workflow. Normally I use external tools to mitigate these sorts of things, but I am struggling in this case.
I'm making use of the civitai template shown here civitai.com downloader. Specifically, I am making use of the httрs://civitai.com/user/UserName/images functionality. The idea for me is to grab everything from a creator, and then to be able to go back and grab newer images. In theory, this seems to work, but I am having some issues.
I've already added hundreds of creators to EPF, each with their own project. I have not had any issues with initially downloading images from their image pages.
-------------------------------
Problem #1
There seems to be no way to bulk update projects if you want them all to have a new setting that you have decided upon. I did find a workaround to this for *some* settings thanks to Notepad++, but it only works for settings that are exactly identical between them (example: setting ResolveNamingConflictsType=rncDoNotSave in all .epr files), not for something like title or destination folder.
Problem #1b
This means that I cannot easily change all of these projects to download to a folder of {UserName} or even civitai-UserName. I get stuck with these more verbose folder names: "civitai.com - user-{UserName} - images". I can do that processing after the fact, but it leads to other issues as I will mention.
Problem #1c
This also means that I cannot bulk change all of these projects to insert a filename prefix of "{UserName}-" to each downloaded file. I can also do that processing after the fact, but it leads to other issues as I will mention.
Given that I already have 500+ projects, updating them manually sounds like hell, and having to manually make some of these changes each time I make a new project is less than ideal. I looked at editing the template, but that does not seem to be able to address issues 1b and 1c in particular, but maybe I am missing something.
Problem #2
While EPF is able to skip downloading files when there is an exact match based on filename & size, it seems to lack the ability to skip downloading files that have been downloaded before when they are no longer present in the download folder.
So, this problem is slightly exacerbated by problems 1b and 1c. I moved all of my initial downloads to another folder, changed all of the folder names to {UserName}, ran dupeguru to purge all duplicate files(even across different extensions, since file contents have been identical between jpg/jpeg/png in all or most cases), and then used a bulk renamer program to append "{FolderName}-" to the beginning of every file inside the folders.
Now, when I have use the "Update" function in EPR, it actually redownloads old images that had been downloaded previously. Oddly, it doesn't seem to *always* do this, which just adds to my confusion but i have a theory. But today, I ran a test. I updated a project and it downloaded 7234 files from one page. I used dupeguru to wipe out all those newly downloaded duplicates, bringing the file count down to 979. So that means there were 979 unique pictures added to their page from the time I did my initial download to doing the update. I left those files in the download folder and immediately ran the update functionality again and the folder ended up with 4894 files. Running dupeguru again resulted in the file count going back down to 979!
This makes it seem like EPF might actually be trying to skip previously downloaded images but maybe something about the matching isn't working? I noticed that EPF's programdata folder has data files for each project that seem to detail previously downloaded files, so maybe it really is trying to skip them, but idk. If that's the case. my only theory as to the cause of the above inconsistency is that it avoids redownloads based on exact URL rather than file information, and maybe civitai has different CDNs for hosting these images, leading to EPF downloading duplicate files from distinct URLs on future runs. But I might be wrong there, i don't know.
-------------------------------------
gallery-dl, which does not support civitai, handles these issues on other websites like say DeviantArt. It can skip all previously downloaded files on subsequent runs thanks to a database of all downloads, and it since it parses those pages in exact order, it can abort the download after a few duplicates are skipped in a row.
Anyway, with EPF, these issues lead to a download and file management trainwreck for me. Updating hundreds of projects takes an eternity even if no new images have been posted. Deduplicating and post-processing(renaming) efforts will get harder and harder over time. Project updates use up tons of bandwidth redownloading multiple copies of files that have already been downloaded, and there doesn't seem to be a great way for me to then identify what downloads are truly new and which are duplicates of old downloads (without deleting the new downloads duplicates, thereby queuing those same files to download AGAIN on the next update!).
I should also echo that the issue identified at viewtopic.php?t=11377 is a bit of a pain, but I am able to mitigate it via Dupeguru, even if it does then contribute to problem #2.
Oh, and I guess I should mention that I'm also running into an issue every now and then where the program crashes and disappears. I cannot re-open the program. Task manager will show multiple instances of it running, and I can kill all but 1. I've tried some command line shenanigans, but ultimately only a reboot has resolved it and allowed me to re-open EPF.
All of that being said, this program is really cool and I don't regret my purchase. I'm just trying to get it to fit into my workflow.
Some issues I am having with EPF
-
- Site Admin
- Posts: 2390
- Joined: 02 Mar 2009, 17:02
Re: Some issues I am having with EPF
Yes, right now using Notepad++ is the best way to bulk-update project settings. And you can actually update all the settings - you can replace the %43%3A%5C%55%73%65... with the plain text:Problem #1
There seems to be no way to bulk update projects if you want them all to have a new setting that you have decided upon. I did find a workaround to this for *some* settings thanks to Notepad++, but it only works for settings that are exactly identical between them (example: setting ResolveNamingConflictsType=rncDoNotSave in all .epr files), not for something like title or destination folder.
DestinationFolder=C:\EPF\Project #1
I've been thinking about this problem and I don't see a way to mass update one selected option in a number of projects. Like, how the program is supposed to know which one you'd like to update? It'll require a new window where users will first select an option(s) they'd like to update and then select a new value for it.
That's most likely it. EPF uses a list of URLs from previous runs of the project and only skips the files that has the exact same URL. So if the URL changes between the runs - EPF will "think" that it has found a new file and re-download it. Plus, it does re-download all the pages because there is no other way to make sure nothing was added to them between the runs: there are a lot of websites that "add" images to the existing pages. And if EPF won't re-download those pages - it will not find the new images.This makes it seem like EPF might actually be trying to skip previously downloaded images but maybe something about the matching isn't working? I noticed that EPF's programdata folder has data files for each project that seem to detail previously downloaded files, so maybe it really is trying to skip them, but idk. If that's the case. my only theory as to the cause of the above inconsistency is that it avoids redownloads based on exact URL rather than file information, and maybe civitai has different CDNs for hosting these images, leading to EPF downloading duplicate files from distinct URLs on future runs. But I might be wrong there, i don't know.
In cases like this I suggest clearing the project's Destination folder and starting a complete re-download using the [ Restart ] button. Yes, I know it consumes traffic and time, but right now EPF doesn't have any way to "recognize" the same files from different URLs.
Yeah, built-in Chromium is to blame here :( It's a bit of a pain to work with it... Hopefully, I'll be able to fix that someday.Oh, and I guess I should mention that I'm also running into an issue every now and then where the program crashes and disappears. I cannot re-open the program. Task manager will show multiple instances of it running, and I can kill all but 1. I've tried some command line shenanigans, but ultimately only a reboot has resolved it and allowed me to re-open EPF.