How self-hosted here are dealing with bad crawlers?

eisammy · April 10, 2025, 4:02pm

Reading this thread: Devs say AI crawlers dominate traffic, forcing blocks on entire countries | Hacker News

I wonder what it’s like for self-hosted people to deal with crawlers practically doing DDoS non-stop, especially on instances within the Fediverse.

Bas · April 11, 2025, 8:20am

I think a good first step is to quantify for yourself how big of an issue this is using the “new” pageview metric:

If you’re seeing something like 60% non-human traffic, that’s probably fine and you don’t need to take action.
If it’s 95%.. yeah, might be time to start investigating solutions.

Jagster · April 11, 2025, 8:57am

Setting Blocked crawler user agents is the admin’s friend. The trash traffic isn’t such a big issue with Discourse because the load isn’t that heavy. But I’ve banned a handful of the worst ones because I really dislike their business model. Everyone is crying about how AI companies are stealing content, which they are actually doing, but SEO companies are much worse — and their bots are really greedy.

But I’m using geo-blocking too, because I can. There are at least half a dozen countries that are sources of knockers and other malicious actors. But if a forum is for a global audience, that isn’t possible, of course.

With my WordPress sites, the same thing is done using Nginx with the help of Varnish.

At the moment, the ratio of humans vs. bots is something like 50/50 in my forum.

Jagster · April 11, 2025, 8:59am

BTW, tag isn’t right, I assume.

RGJ · April 11, 2025, 9:22am

I agree, the AI tag has a plugin icon so I assume it is meant for the AI plugin only. I’ve removed it.

Crawler content gets heavily cached so in practice I’ve never seen them to be able to DDoS.

Are you actually having performance issues because of this?

Topic		Replies	Views
Smarter handling of random crawler traffic Feature	2	3488	March 29, 2018
Controlling Web Crawlers For a Site Site Management how-to	10	2243	July 19, 2025
Too many Crawlers, is that a problem? Data & reporting	6	2493	June 25, 2020
MegaIndex bot did about 4,000 pageviews on one day Community	40	4453	December 2, 2023
Spiders are making up the majority of our traffic! Support	4	72	September 14, 2024

How self-hosted here are dealing with bad crawlers?

Related topics