truecandid

Post Reply
madnauseam
Posts: 47
Joined: 14 Nov 2022, 21:48

truecandid

Post by madnauseam »

Hi,

Trying to create a custom template to run through all subforum threads and get the first comment links, but I am having a bit of trouble.

The forum is:

Code: Select all

https://truecandid.com/forum/5-leggings-yoga-pants/?sortby=start_date&sortdirection=desc
My template, besides the standard options like search "Entire Page", includes:

Included URLS:

Code: Select all

/uploads/monthly  <- All image files links have this string
/forum/5-leggings-yoga-pants/ <-forum page (to get subsequent pages)
/topic/([^/\?#]+)  <- thread links 
Excluded URLS:

Code: Select all

/members/
/tags/
/search/
/avatar/
/thumbs/
/flags/
/styles/
/tags/
/uploads/css
#
https://truecandid.com/
comment
#elcontrols
/uploads/
Excluded Page Parts:

Code: Select all

From000=
To000=</header>
From001=
To001=<ol class="ipsClear ipsDataList cForumTopicTable  cTopicList 
From002=
To002="forums.front.forum.topicRow">
From003=<div class="ipsDataItem_icon ipsPos_top">
To003=<div class="tthumb_wrap" data-ipslazyload="">
From004=alt="
To004="forums.front.forum.topicRow">
From005=alt="
To005=<li class="ipsPagination_page
From006=
To006=<div class="cPost_contentWrap">
From007=<div data-role="commentContent"
To007=>
From008=<div class="ipsMessage ipsMessage_info" id="bimHiddenContentRequires_both">
To008=</div>
From009=" data-rowid="
To009=<a href=
From010=<ul class="ipsPagination" id="
To010=<li class="ipsPagination_page ipsPagination_active">
From011=<li class="ipsPagination_last">
To011=
From012=<div class="ipsItemControls">
To012=
From013=<span class
To013=>
From014=<ul class="ipsList_reset ipsComment_tools">
To014=</ul>
From015=data-ipslightbox=""
To015=</a>

While using the test function, I was able to confirm the template successfully parses the forum pages, getting the links for each subforum and subsequent pages:

Code: Select all

				
					
				
					<a href="https://truecandid.com/topic/34980-love-vpl-on-the-teens/" 
				
					
				
					<a href="https://truecandid.com/topic/34978-hottie-blondie-teen-and-pal/" 
				
					
				
					<a href="https://truecandid.com/topic/34960-dollar-teen/" 
				
					
				
					<a href="https://truecandid.com/topic/34947-tight/" 
				
					
				
					<a href="https://truecandid.com/topic/34946-tightest-blonde/" 
				
					
				
					<a href="https://truecandid.com/topic/34944-vtl-hottie-in-lulus/" 
				
					
				
					<a href="https://truecandid.com/topic/34928-ugly-young-tard-in-cheap-leggings-pretends-to-be-a-thing/" 
				
					
				
					<a href="https://truecandid.com/topic/34921-dirty-street-chav-likes-cheap-skin-tight-leggings/" 
				
					
				
					<a href="https://truecandid.com/topic/34914-sweet-looking-ebony-perfect-round-jiggle/" 
				
					
				
					<a href="https://truecandid.com/topic/34913-petite-bubble-butt-at-chipotle/" 
				
					
				
					<a href="https://truecandid.com/topic/34903-same-sexy-fitgirl-different-day/" 
				
					
				
					<a href="https://truecandid.com/topic/34895-irish-teen-in-tight-leggings/" 
				
					
				
					<a href="https://truecandid.com/topic/34894-irish-teen-in-leggings/" 
				
					
				
					<a href="https://truecandid.com/topic/34892-fire-stuffs-9-%F0%9F%94%A5-always-free/" 
				
					
				
					<a href="https://truecandid.com/topic/34866-fire-stuffs-6-%F0%9F%94%A5-4parts-always-free/" 
				
					
				
					<a href="https://truecandid.com/topic/34836-young-pawg-shopping/" 
				
					
				
					<a href="https://truecandid.com/topic/34834-another-great-busted-slut-blonde/" 
				
					
				
					<a href="https://truecandid.com/topic/34825-free-full-movie-ridiculous-jiggleon-this-grey-teenie/" 
				
					
				
					<a href="https://truecandid.com/topic/34824-latina-girl-wearing-tight-shiny-lululemon-leggings/" 
				
					
				
					<a href="https://truecandid.com/topic/34812-young-asian-slim-milf/" 
				
					
				
					<a href="https://truecandid.com/topic/34796-mega-ass-incredible/" 
				
					
				
					<a href="https://truecandid.com/topic/34795-nice-busted-jiggle-ass/" 
				
					
				
					<a href="https://truecandid.com/topic/34794-zara-blonde-teen-black-lulus/" 
				
					
				
					<a href="https://truecandid.com/topic/34791-crazy-curvy-cheeks/" 
				
					
				
					<a href="https://truecandid.com/topic/34777-teen-in-teveo-leggings/"  ipsPagination_active"><a href="https://truecandid.com/forum/5-leggings-yoga-pants/?sortby=start_date&amp;sortdirection=desc" data-page="1">1</a></li>
			
				
					<li class="ipsPagination_page"><a href="https://truecandid.com/forum/5-leggings-yoga-pants/page/2/?sortby=start_date&amp;sortdirection=desc" data-page="2">2</a></li>
				
					<li class="ipsPagination_page"><a href="https://truecandid.com/forum/5-leggings-yoga-pants/page/3/?sortby=start_date&amp;sortdirection=desc" data-page="3">3</a></li>
				
					<li class="ipsPagination_page"><a href="https://truecandid.com/forum/5-leggings-yoga-pants/page/4/?sortby=start_date&amp;sortdirection=desc" data-page="4">4</a></li>
				
					<li class="ipsPagination_page"><a href="https://truecandid.com/forum/5-leggings-yoga-pants/page/5/?sortby=start_date&amp;sortdirection=desc" data-page="5">5</a></li>
				
					<li class="ipsPagination_page"><a href="https://truecandid.com/forum/5-leggings-yoga-pants/page/6/?sortby=start_date&amp;sortdirection=desc" data-page="6">6</a></li>
				
				<li class="ipsPagination_next"><a href="https://truecandid.com/forum/5-leggings-yoga-pants/page/2/?sortby=start_date&amp;sortdirection=desc" rel="next" data-page="2" data-ipstooltip="" _title="Next page">Next</a></li>
				

While also being able to retrieve the relevant image links for each thread (taken from the first thread from the previous list):

Code: Select all

		
		
			<p>
	<a class="ipsAttachLink ipsAttachLink_image" href="https://truecandid.com/uploads/monthly_2025_06/2087518100_LoveVPLontheTeens.jpg.542e1f64f9b5ff72bdad529c643ae5e7.jpg" data-fileid="207862" data-fileext="jpg" rel="" data-fullurl="https://truecandid.com/uploads/monthly_2025_06/2087518100_LoveVPLontheTeens.jpg.542e1f64f9b5ff72bdad529c643ae5e7.jpg" 
</p>

<p>
	<a href="https://truecandid.com/uploads/monthly_2025_06/1aaaaaa.jpg.e5a2d3ccba74b93d294b3b0c69836970.jpg" title="Enlarge image" data-wrappedlink="" <a href="https://truecandid.com/uploads/monthly_2025_06/1aaewf444str.jpg.8abb49ffb7671b7b52743c44b3af9d92.jpg" title="Enlarge image" data-wrappedlink="" <a class="ipsAttachLink ipsAttachLink_image" href="https://truecandid.com/uploads/monthly_2025_06/705098421_LoveVPLontheTeens!.jpg.f84f06587ac53e04950ea941f7bb42b2.jpg" data-fileid="207861" data-fileext="jpg" rel="" data-fullurl="https://truecandid.com/uploads/monthly_2025_06/705098421_LoveVPLontheTeens!.jpg.f84f06587ac53e04950ea941f7bb42b2.jpg" 
</p>

<p>
	<a href="https://truecandid.com/uploads/monthly_2025_06/1aaaaaa.jpg.e5a2d3ccba74b93d294b3b0c69836970.jpg" title="Enlarge image" data-wrappedlink="" <a href="https://truecandid.com/uploads/monthly_2025_06/1aaewf444str.jpg.8abb49ffb7671b7b52743c44b3af9d92.jpg" title="Enlarge image" data-wrappedlink="" <a class="ipsAttachLink ipsAttachLink_image" href="https://truecandid.com/uploads/monthly_2025_06/484941132_LoveVPLontheTeens22.jpg.2ff433f681dab54aa42c116cb5c77a07.jpg" data-fileid="207864" data-fileext="jpg" rel="" data-fullurl="https://truecandid.com/uploads/monthly_2025_06/484941132_LoveVPLontheTeens22.jpg.2ff433f681dab54aa42c116cb5c77a07.jpg" 
</p>

<p>
	<a href="https://truecandid.com/uploads/monthly_2025_06/1aaaaaa.jpg.e5a2d3ccba74b93d294b3b0c69836970.jpg" title="Enlarge image" data-wrappedlink="" <a href="https://truecandid.com/uploads/monthly_2025_06/1aaewf444str.jpg.8abb49ffb7671b7b52743c44b3af9d92.jpg" title="Enlarge image" data-wrappedlink="" <a class="ipsAttachLink ipsAttachLink_image" href="https://truecandid.com/uploads/monthly_2025_06/407868785_LoveVPLontheTeens2.jpg.262d247c7546c953efc405f9c64a344e.jpg" data-fileid="207863" data-fileext="jpg" rel="" data-fullurl="https://truecandid.com/uploads/monthly_2025_06/407868785_LoveVPLontheTeens2.jpg.262d247c7546c953efc405f9c64a344e.jpg" 
</p>

<p>
	<a href="https://truecandid.com/uploads/monthly_2025_06/1aaaaaa.jpg.e5a2d3ccba74b93d294b3b0c69836970.jpg" title="Enlarge image" data-wrappedlink="" <a href="https://truecandid.com/uploads/monthly_2025_06/1aaewf444str.jpg.8abb49ffb7671b7b52743c44b3af9d92.jpg" title="Enlarge image" data-wrappedlink="" 
</p>


			
		</div>

		
			
however, the download log is filled with stuff like this:

Code: Select all

https://truecandid.com/topic/34980-love-vpl-on-the-teens/#comment-2719983
https://truecandid.com/topic/34980-love-vpl-on-the-teens/?do=getLastComment
https://truecandid.com/topic/34980-love-vpl-on-the-teens/?do=reactComment&comment=2720224&reaction=10&csrfKey=4f32e99dbb6df8ffa1a0bc40b846d2b3
https://truecandid.com/topic/34980-love-vpl-on-the-teens/?do=reactComment&comment=2720224&reaction=9&csrfKey=4f32e99dbb6df8ffa1a0bc40b846d2b3
https://truecandid.com/topic/34980-love-vpl-on-the-teens/?do=reactComment&comment=2719983&reaction=1&csrfKey=4f32e99dbb6df8ffa1a0bc40b846d2b3
https://truecandid.com/topic/34921-dirty-street-chav-likes-cheap-skin-tight-leggings/?do=getLastComment
https://truecandid.com/topic/21614-amazing-jiggly-round-ass-on-pattern-leggins/#comments
https://truecandid.com/topic/34984-pigtail-teen-has-a-nice-ass-on-show-part-1/?do=getLastComment
Which adds thousands of unnecessary links and greatly increases parsing time.

Can you help me figure out whats wrong with the template?

Thank you
Maksym
Site Admin
Posts: 2434
Joined: 02 Mar 2009, 17:02

Re: truecandid

Post by Maksym »

First of all, let me tell you that I'm impressed with what you have created. Not very often do I see users go THAT deep into the project options.

Now the suggestions:

1. Remove the URL parameters (after and including the ? sign) from your Starting Address. You want all the content, what's the differentce in whiuch order you'll get it? But it will graely simplify your filters.

2. You excluded all addresses by adding "httрs://truecandid.com/" to the [ Excluded URLs ]. And that's a pretty good move for any forum. But it also makes the rest of the filters in the [ Excluded URLs ] section useless. So, I suggest using only one filter in the [ Excluded URLs ]:

Code: Select all

^https?://(www\.)?truecandid\.com
3. Now you need to create a list of the filters in the [ Included URLs ] section to allow only those parts of the website that you need Extreme Picture Finder to crawl. First, and the most obvious, is the filter to allow any thread:

Code: Select all

/topic/\d+-[^/\?#]+/$
Then you need to allow the thread and forum pages:

Code: Select all

/page/\d+/$
And then, of course, the pictures:

Code: Select all

/uploads/monthly[^#]+$
You do not need any filters for your Starting Address - it will be downloaded anyway.

Note the "$" at the end of the filters. This is done to prevent the Regular Expressions from matching the "junk" addresses. It means "the end of the line or the text" (which is the same when we are talking about the URLs).
madnauseam
Posts: 47
Joined: 14 Nov 2022, 21:48

Re: truecandid

Post by madnauseam »

Thank you, Maksym.

Also, great suggestions, much appreciated. Seems to be working just fine.
Post Reply