Links with extra "\\\"

Post Reply
madnauseam
Posts: 41
Joined: 14 Nov 2022, 21:48

Links with extra "\\\"

Post by madnauseam »

Hi,

I was creating a custom template and I ended up with a lot of links that end with multiple "\\\\" chars.
I believe the correct link is still parsed, but this generates a lot of needless links.

Do you have any idea what could be causing this issue?

Thanks

EDIT: I should have added a bit more info.


Take this link (NSWF)

Code: Select all

https://forum.candidgirls.io/t/fit-latina-in-onepiece-flex-gif/499923
In the Excluded links I have:

Code: Select all

^https?://(www\.)?candidgirls\.io
https://forum.candidgirls.io/privacy
https://forum.candidgirls.io/login
https://forum.candidgirls.io/tos
https://forum.candidgirls.io/guidelines
https://forum.candidgirls.io/categories
https://forum.candidgirls.io/u
https://forum.candidgirls.io/tag
https://forum.candidgirls.io/letter_avatar_proxy
images/emoji/
https://forum.candidgirls.io/c
In the Included links I have:

Code: Select all

https://forum.candidgirls.io/t/
https://forum.candidgirls.io/uploads/    (this one is where the attachments are)
I have played a bit with the excluded pages parts in order to parse out needless code other than the image links, but I am still getting these extra links.

Once again, thank you.
Maksym
Site Admin
Posts: 2259
Joined: 02 Mar 2009, 17:02

Re: Links with extra "\\\"

Post by Maksym »

It looks like all the links with the extra "\" characters are located before the "</header>" tag, so adding one [ Excluded Page Part ] like this should do the job:

Code: Select all

From:
To: </header>
Another note. Why do you exclude www.candidgirls.io? There are no links to that website from the forum. I think the only Excluded URL (if you want to go that way) that you need is

Code: Select all

candidgirls\.io
And the only [Included URL] that you need is

Code: Select all

/original/
I think you are overcomplicating things.
madnauseam
Posts: 41
Joined: 14 Nov 2022, 21:48

Re: Links with extra "\\\"

Post by madnauseam »

Maksym wrote: 29 Nov 2024, 11:24 It looks like all the links with the extra "\" characters are located before the "</header>" tag, so adding one [ Excluded Page Part ] like this should do the job:

Code: Select all

From:
To: </header>
Another note. Why do you exclude www.candidgirls.io? There are no links to that website from the forum. I think the only Excluded URL (if you want to go that way) that you need is

Code: Select all

candidgirls\.io
And the only [Included URL] that you need is

Code: Select all

/original/
I think you are overcomplicating things.
Thanks for your input, Maksym!

The idea I had in mind was to crawl a category, for example:
and download each thread from that category.

Perhaps I'm not going in the right direction and there is an easier way to do that?
Maksym
Site Admin
Posts: 2259
Joined: 02 Mar 2009, 17:02

Re: Links with extra "\\\"

Post by Maksym »

You never mentioned you wanted the categories :( How did you plan to handle the pagination of the category pages?
madnauseam
Posts: 41
Joined: 14 Nov 2022, 21:48

Re: Links with extra "\\\"

Post by madnauseam »

So far I was allowing links in the form of " ...\t\ " and I was using the "Excluded page parts" to remove the links for other threads other than &page=.

But yeah, I am a noob with this, so I'm sure there are better ways to do this.
Maksym
Site Admin
Posts: 2259
Joined: 02 Mar 2009, 17:02

Re: Links with extra "\\\"

Post by Maksym »

Yeah, that's the easiest way. Simply add

/t/

and

page=\d+

to the [ Included URLs ]
madnauseam
Posts: 41
Joined: 14 Nov 2022, 21:48

Re: Links with extra "\\\"

Post by madnauseam »

Thank you for all your help!
Post Reply