If you're getting 403 Forbidden errors browsing the forum

Announcements of changes to the forums
Post Reply
User avatar
fluffy
Eisenhower
Posts: 11267
Joined: Sat Sep 25, 2004 10:56 am
Instruments: sometimes
Recording Method: Logic Pro X
Submitting as: Sockpuppet
Pronouns: she/they
Location: Seattle-ish
Contact:

If you're getting 403 Forbidden errors browsing the forum

Post by fluffy »

In the ongoing fight against AI crawler bots, I made a couple of changes to the forum which might cause some people to encounter a bunch of 403 Forbidden errors temporarily. If you're running into that, make sure there isn't a "sid=" in the URL, and you'll want to reload any older forum page so that you get new links that don't have that present.

Technical explanation

All modern websites use cookies (little pieces of data sent alongside the webpage) to keep track of who's logged in. Back when phpBB was written, a lot of people were hesitant to allow cookies, so phpBB would also put the cookie data into the URL, and this is vestigial functionality that has been completely unnecessary for over two decades.

Part of how phpBB implemented the session system was to also assign session IDs to folks who aren't logged in, for some reason, and had a list of known bots that it would not assign those for, so that search engines wouldn't see random data in the URL and end up crawling every single page on a site a billion times. Most unknown crawlers were also smart enough to remove the sid value from the URL anyway.

Unfortunately, with the advent of aggressive AI crawlers, there's been an explosion of bots that are poorly-written and which also go out of their way to disguise themselves as being regular web browsers. As a result, each of these crawlers ends up seeing an infinite number of unique URLs, and in their quest to extract Every Last Byte of Information, they are super aggressive about trying to see every single one. For the past several months, our little forum has been besieged by millions of requests from hundreds of thousands of IP addresses, and it's a wonder things have stayed up as well as they have.

Whenever there's been a major outage I've gone into the logs and found groups of aggressive crawlers to block by IP address, but the AI companies have seen their bots getting blocked and instead of doing the smart thing to fix their fucking crawlers to not be so disastrous to the Internet, they've instead decided to spread the load out by launching massive botnets that come from every IP address they can get their hands on. Huge data centers around the world (especially in developing nations) are part of this, and I suspect that this is also the purpose of apps like HoneyGain which reward people for running a "network speed testing" app on as many devices as possible, basically turning the entire Internet into a giant AI botnet.

Anyway, phpBB is finally removing SIDs from URLs in the upcoming 4.0 release, but for various reasons they have opted not to make this change to 3.x (which is what everyone is currently running and which many sites will continue to run for quite some time), and even when URL SIDs are removed, there will still be a giant backlog of bots still exploring the old, known URLs, which will continue to be valid indefinitely. So the only real way to stop the torrent of suck is by making those URLs invalid.

Unfortunately there is no way to distinguish between a real user looking at an old URL vs. a shitty AI bot, so for now there will be some choppiness for some people. Never mind, I modified the access rule to tell whether you've got a login cookie, in which case you should never be forbidden.
User avatar
Aciniform Artifice
Karski
Posts: 93
Joined: Wed Aug 17, 2022 9:41 am
Pronouns: he/him
Location: Pennsylvania, USA
Contact:

Re: If you're getting 403 Forbidden errors browsing the forum

Post by Aciniform Artifice »

Ironically (maybe?) I followed the link to this topic from Discord and it was fine. But then I noticed I wasn't logged into the forum so I hit login, and THEN it gave me a 403.
User avatar
fluffy
Eisenhower
Posts: 11267
Joined: Sat Sep 25, 2004 10:56 am
Instruments: sometimes
Recording Method: Logic Pro X
Submitting as: Sockpuppet
Pronouns: she/they
Location: Seattle-ish
Contact:

Re: If you're getting 403 Forbidden errors browsing the forum

Post by fluffy »

Interesting, there might be something missing from the addon. I'll investigate it!
User avatar
fluffy
Eisenhower
Posts: 11267
Joined: Sat Sep 25, 2004 10:56 am
Instruments: sometimes
Recording Method: Logic Pro X
Submitting as: Sockpuppet
Pronouns: she/they
Location: Seattle-ish
Contact:

Re: If you're getting 403 Forbidden errors browsing the forum

Post by fluffy »

Yep looks like the login page generates an SID in the URL. Well that's annoying. There's a silly workaround I can try for that, at least.
User avatar
fluffy
Eisenhower
Posts: 11267
Joined: Sat Sep 25, 2004 10:56 am
Instruments: sometimes
Recording Method: Logic Pro X
Submitting as: Sockpuppet
Pronouns: she/they
Location: Seattle-ish
Contact:

Re: If you're getting 403 Forbidden errors browsing the forum

Post by fluffy »

Okay, I've added a simple workaround, although it won't work for people who are blocking HTTP referers. The Internet is a pile of hacks!

Never mind, figured out a better workaround: if you have a login cookie it shouldn't give you the forbidden error.
Post Reply