I had 2 occasions of a spikes, 8th & 18th of January - both times from Yandex, the Russian web crawler. Both times attempted crawls went up more than double. The biggest snoop over time is petalbot from PetalSearch.com. They had between 4x-6x the number of scans than Yandex and other bots.
Yesterday 02/05/2023 Seekport Bot went wild
they seem to think that they are legit, clearly wrong tho
Another bot resulting in an outlier of excessive page views on a single day.
Date: 2023-05-04
Sometimes those hits are by legit bot. Sometimes… something else. IP-addresses quite often reveals the truth.
Anyway — those are totally useless and those are basically only stealing content and never gives anything back. Only way to stop those is a reverse proxy. But AFAIK the situation is good with Discourse because those did not increase load so much. On WordPress world such situation would or could put a site down.
Thanks for making my point!
how do you see this?
That looks like one of the standard reports. You should be able to find yours at /admin/reports/web_crawlers
AI Summary:
The discussion is about spikes in pageviews from web crawler bots on Discourse sites. Some bots that were identified as causing large spikes in pageviews include:
- MegaIndex bot: did around 4,000 pageviews in one day
- MJ12bot: did over 5,000 pageviews in one day
- Seekport bot: caused spikes on multiple occasions
- Yandex bot: caused pageviews to more than double on two occasions
- PetalBot from PetalSearch.com: did 4-6x more page scans than other bots
- DataForSEO bot: caused a spike of over 15,000 pageviews in one day
These spikes can sometimes cause performance issues. Ways to limit bot traffic include using robots.txt, though not all bots respect this. Other options are server-level blocking and using a reverse proxy. The bots are seen as “stealing content” without providing value.
You posted and a AI generated summary here, in response to a request by me, and now it is gone.
Did you delete it?
Ah yes I did, because your original request was removed too
Restored it now
Thanks.
I tend to remove replies that after some time are of no value in reading the topic but that don’t leave holes in the topic. As it was a simple request and you created the summary, there was no need for others to read the request every time they visited this topic.
It is a habit I picked up from StackExchange sites where I would leave comments then later delete the comments. There are also other more useful comments that I created for myself and others but not directly related to the topic that start with Of interest
. I probably have a few hundred such comments on StackExchange sites. Really wanted such for Discourse but the suggestion never gained traction.
In checking visits to our site the page views were extremely unusual. Looking at the high level overview
the excess starts on 10-23-2023 with what I am guessing is one anonymous user.
A check of
shows that the increase changes to a few bots that I did not immediately recognize
- fidget-spinner-bot
- my-tiny-bot
- thesis-research-bot
Just passing this info along as it may be of value to you.
Happening to me too
I think we’ve seen a couple on instances of this. It seems like it’s a crawler that’s not saying it’s a crawler so it gets counted as ‘anonymous’ views.
(Edit by poster - this post was originally a new thread, since merged here, which is fine. Was titled “Curiosity: big reduction in crawler visits since early November (2023)”)
I don’t believe anything changed on my side at this point:
Anyone else see anything similar?
There’s no big exchange of numbers between Anon and Crawler, so it’s not a categorisation change.
Yes
Take a look at this topic and specifically this post
Since you posted a Consolidated Pageviews
report I take it you have admin access.
<site>/admin/reports/consolidated_page_views
also make use of
<site>/admin/reports/web_crawlers
to identify which web crawler is doing the page views.
As some of us have discovered these bots recently appeared and are causing the high numbers
- fidget-spinner-bot
- my-tiny-bot
- thesis-research-bot
Ah yes, I had seen that topic, which was about increases. But indeed, all three of those were responsible for the high numbers - after the 8th, they have all gone, and we’re back to some kind of baseline. Which explains the decrease.
(Mods: fine to glue this thread to the bottom of that one.) (Edit: thanks mods!)