MegaIndex bot did about 4,000 pageviews on one day

I had 2 occasions of a spikes, 8th & 18th of January - both times from Yandex, the Russian web crawler. Both times attempted crawls went up more than double. The biggest snoop over time is petalbot from PetalSearch.com. They had between 4x-6x the number of scans than Yandex and other bots.

1 Like

Yesterday 02/05/2023 Seekport Bot went wild

image

1 Like

Another bot resulting in an outlier of excessive page views on a single day.

Date: 2023-02-23

1 Like

they seem to think that they are legit, clearly wrong tho

1 Like

Another bot resulting in an outlier of excessive page views on a single day.

https://bot.seekport.com/

Date: 2023-05-04

1 Like

Sometimes those hits are by legit bot. Sometimes… something else. IP-addresses quite often reveals the truth.

Anyway — those are totally useless and those are basically only stealing content and never gives anything back. Only way to stop those is a reverse proxy. But AFAIK the situation is good with Discourse because those did not increase load so much. On WordPress world such situation would or could put a site down.

2 Likes

Thanks for making my point!

1 Like

how do you see this?

1 Like

That looks like one of the standard reports. You should be able to find yours at /admin/reports/web_crawlers :+1:

3 Likes

i hate palo alto

2 Likes

AI Summary:

The discussion is about spikes in pageviews from web crawler bots on Discourse sites. Some bots that were identified as causing large spikes in pageviews include:

  • MegaIndex bot: did around 4,000 pageviews in one day
  • MJ12bot: did over 5,000 pageviews in one day
  • Seekport bot: caused spikes on multiple occasions
  • Yandex bot: caused pageviews to more than double on two occasions
  • PetalBot from PetalSearch.com: did 4-6x more page scans than other bots
  • DataForSEO bot: caused a spike of over 15,000 pageviews in one day

These spikes can sometimes cause performance issues. Ways to limit bot traffic include using robots.txt, though not all bots respect this. Other options are server-level blocking and using a reverse proxy. The bots are seen as “stealing content” without providing value.

1 Like

@Bas

You posted and a AI generated summary here, in response to a request by me, and now it is gone.

Did you delete it?

Ah yes I did, because your original request was removed too :slight_smile:
Restored it now

Thanks.

I tend to remove replies that after some time are of no value in reading the topic but that don’t leave holes in the topic. As it was a simple request and you created the summary, there was no need for others to read the request every time they visited this topic.

It is a habit I picked up from StackExchange sites where I would leave comments then later delete the comments. There are also other more useful comments that I created for myself and others but not directly related to the topic that start with Of interest. I probably have a few hundred such comments on StackExchange sites. Really wanted such for Discourse but the suggestion never gained traction.

1 Like

In checking visits to our site the page views were extremely unusual. Looking at the high level overview

the excess starts on 10-23-2023 with what I am guessing is one anonymous user.

A check of

shows that the increase changes to a few bots that I did not immediately recognize

  • fidget-spinner-bot
  • my-tiny-bot
  • thesis-research-bot

Just passing this info along as it may be of value to you.

2 Likes

Happening to me too

3 Likes

I think we’ve seen a couple on instances of this. It seems like it’s a crawler that’s not saying it’s a crawler so it gets counted as ‘anonymous’ views.

1 Like

(Edit by poster - this post was originally a new thread, since merged here, which is fine. Was titled “Curiosity: big reduction in crawler visits since early November (2023)”)

I don’t believe anything changed on my side at this point:

Anyone else see anything similar?

There’s no big exchange of numbers between Anon and Crawler, so it’s not a categorisation change.

3 Likes

Yes

Take a look at this topic and specifically this post

Since you posted a Consolidated Pageviews report I take it you have admin access.

<site>/admin/reports/consolidated_page_views

also make use of

<site>/admin/reports/web_crawlers

to identify which web crawler is doing the page views.

As some of us have discovered these bots recently appeared and are causing the high numbers

  • fidget-spinner-bot
  • my-tiny-bot
  • thesis-research-bot
1 Like

Ah yes, I had seen that topic, which was about increases. But indeed, all three of those were responsible for the high numbers - after the 8th, they have all gone, and we’re back to some kind of baseline. Which explains the decrease.

(Mods: fine to glue this thread to the bottom of that one.) (Edit: thanks mods!)

3 Likes