Anonymous views suddenly very high

I never know what to think of the anonymous views numbers but it never seems to corresponds to Google analytics data in any meaningful way.

The last four or so days has brought this into focus because there has been a huge sustained increase in anonymous views which is unusual.

It may be coincidental since upgrading to 3/3.1 but could it be related as it began a little after.

While also seeing that login stats look to have dropped significantly, that’s concerning.

Generally it hard to know hit to view login stats either as they don’t correspond to analytic numbers but looking only at the dashboard these is a strange new trend.

In terms of the disparity of Google analytics and dashboard stats, here is a simple example - you see 500 Google analytics unique visits for a day but on discourse dashboard 2000 logins, 50000 anonymous view and 5000 crawkers.

What is going on overall with these stats?

How should we treat the data and what can it tell us in terms of managing a discourse forum?

Are anonymous views an indication of unsolicited farmed traffic and a waste of resources?

Such traffic is filtered from Google analytics if you select the right option - maybe it’s not on the discourse side and could indicate it be some kind of low level DDOS type traffic for whatever bizarre and spurious reasons, again wasting resources but possibly affecting genuine logins?

No report thus far of login issues.

Overall how do we interpret the dashboard numbers!

Thanks fir any insight and tips.

3 Likes

Hi @agemo !

I also see similar behaviour in the last couple of months. Did you manage to find something out about this?

Probably bots, possibly AI scraping bots.

What helped me immensely were the web crawler reports in admin after being tipped me off on their existence (I had never noticed them before) using this I banned various crawlers. Which dropped anon views. I do think crawlers come in on anon views too. I have no idea how.

The reports also give you b=names of crawlers (user agents) to search up on each crawler to see if it has value.

This topic might also be of use

3 Likes

That could be, also don’t know how other than would guess just depends on how the system identifies bots vs. users.

I’ve seen spikes of crawlers when more text is published also seemingly random spike of anon views few days ago.

We have a new metric in place now which splits out anon pageviews into ‘likely human’ and ‘probably bot’ so people can think of the latter more like crawlers (which they likely should be, but aren’t identifying themselves as such).

The report is part of the stock ones and can be found at /admin/reports/consolidated_page_views_browser_detection

There’s also some other work in progress to apply this to topic view metrics as well to prevent bot-bloating.

3 Likes

This new graph is helpful, looks like the ‘probably bot’ category is labeled as ‘other pageviews’:

With the cheeky new bots who aren’t introducing themselves properly, is there anyway to slow those down or identify their source?

Can check the web crawler user agent reports, but if they don’t show up there not sure what else to investigate.

1 Like

No. If coder of that bot is worked as they quite often do meaning there is some part of text in user agent, device, systen etc. that is same all the time then you can block them totally, but you need a reverse proxy. robots.txt is only guideline for good behaving bots.

Logs by Discourse are more or less just giving a blink of big picture. Such detailed data you must dig from logs of Nginx, meaning welcome console :smirk:

WordPress can easily go on its knees because of bots, but with Discourse the situation is more just annoying. Content stealing is today’s norm, and has been long time now.

1 Like

Reverse Proxy seems like a good first step, is cloudflare good for that?

Know a local web-developer buddy who recommended using cloudflare nameservers for security can be good.

I’m not too concerned about published content being ‘stolen,’ when text is published in public people have a right to record that as long as they aren’t trying to sell that as their own creation would then become a problem.

I would suggest Nginx or Varnish. But maybe Cloudflare works too, I don’t know it, never used.

1 Like

Had unusual spike of bot “other pageviews” yesterday August 17th of 152, very random for mostly inactive site that usually has only about 15-20 of those a day.

Totally normal. For me I got the best results using together blocking the worst user agents and geo-blogging (mine isn’t global forum, so I can easily do it).

Do you mean geographical-banning of I.P.s from other countries besides Finland? That seems like good idea for local-focused sites.

Yeah. Right now I would get a lot of traffic from Russia, Singapore and China. Earlier it was India, Pakistan, Egypt, Iran and Irak. And I bet they can’t Finnish :wink: It is possible with Russia, though, but… no.

The biggest three are USA, France and Netherlands, and Germany is growing. But that is because of data centers and that’s why I can’t ban those.

But again, with Discourse those are mainly just annoying. With WordPress (and other LAMP stacks I would say) those create so big load that the situation starts to be closer DDoS.

And the most are by stupid script kiddies that try to knock Discourse using ancient WordPress issues.

But nowadays SEO and AI bots has started be a real questionmark.

But if one has a local forum then geo banning is just wise move.

1 Like