Anonymous views suddenly very high

agemo · January 15, 2023, 12:43pm

I never know what to think of the anonymous views numbers but it never seems to corresponds to Google analytics data in any meaningful way.

The last four or so days has brought this into focus because there has been a huge sustained increase in anonymous views which is unusual.

It may be coincidental since upgrading to 3/3.1 but could it be related as it began a little after.

While also seeing that login stats look to have dropped significantly, that’s concerning.

Generally it hard to know hit to view login stats either as they don’t correspond to analytic numbers but looking only at the dashboard these is a strange new trend.

In terms of the disparity of Google analytics and dashboard stats, here is a simple example - you see 500 Google analytics unique visits for a day but on discourse dashboard 2000 logins, 50000 anonymous view and 5000 crawkers.

What is going on overall with these stats?

How should we treat the data and what can it tell us in terms of managing a discourse forum?

Are anonymous views an indication of unsolicited farmed traffic and a waste of resources?

Such traffic is filtered from Google analytics if you select the right option - maybe it’s not on the discourse side and could indicate it be some kind of low level DDOS type traffic for whatever bizarre and spurious reasons, again wasting resources but possibly affecting genuine logins?

No report thus far of login issues.

Overall how do we interpret the dashboard numbers!

Thanks fir any insight and tips.

LiraLemur · July 5, 2024, 10:18am

Hi @agemo !

I also see similar behaviour in the last couple of months. Did you manage to find something out about this?

agemo · July 10, 2024, 1:35pm

Probably bots, possibly AI scraping bots.

What helped me immensely were the web crawler reports in admin after being tipped me off on their existence (I had never noticed them before) using this I banned various crawlers. Which dropped anon views. I do think crawlers come in on anon views too. I have no idea how.

The reports also give you b=names of crawlers (user agents) to search up on each crawler to see if it has value.

This topic might also be of use

Why there is huge difference in Pageviews tracking number on discourse admin dahboard and on google analytics
Has anyone seen the OpenAI web crawler GPTBot visit their site?

Architect · July 10, 2024, 1:46pm

That could be, also don’t know how other than would guess just depends on how the system identifies bots vs. users.

I’ve seen spikes of crawlers when more text is published also seemingly random spike of anon views few days ago.

JammyDodger · July 10, 2024, 2:01pm

We have a new metric in place now which splits out anon pageviews into ‘likely human’ and ‘probably bot’ so people can think of the latter more like crawlers (which they likely should be, but aren’t identifying themselves as such).

The report is part of the stock ones and can be found at /admin/reports/consolidated_page_views_browser_detection

There’s also some other work in progress to apply this to topic view metrics as well to prevent bot-bloating.

Architect · July 11, 2024, 10:18pm

This new graph is helpful, looks like the ‘probably bot’ category is labeled as ‘other pageviews’:

With the cheeky new bots who aren’t introducing themselves properly, is there anyway to slow those down or identify their source?

Can check the web crawler user agent reports, but if they don’t show up there not sure what else to investigate.

Jagster · July 12, 2024, 6:58am

No. If coder of that bot is worked as they quite often do meaning there is some part of text in user agent, device, systen etc. that is same all the time then you can block them totally, but you need a reverse proxy. robots.txt is only guideline for good behaving bots.

Logs by Discourse are more or less just giving a blink of big picture. Such detailed data you must dig from logs of Nginx, meaning welcome console

WordPress can easily go on its knees because of bots, but with Discourse the situation is more just annoying. Content stealing is today’s norm, and has been long time now.

Architect · July 12, 2024, 4:08pm

Reverse Proxy seems like a good first step, is cloudflare good for that?

Know a local web-developer buddy who recommended using cloudflare nameservers for security can be good.

I’m not too concerned about published content being ‘stolen,’ when text is published in public people have a right to record that as long as they aren’t trying to sell that as their own creation would then become a problem.

Jagster · July 12, 2024, 4:10pm

I would suggest Nginx or Varnish. But maybe Cloudflare works too, I don’t know it, never used.

Architect · August 18, 2024, 2:25am

Had unusual spike of bot “other pageviews” yesterday August 17th of 152, very random for mostly inactive site that usually has only about 15-20 of those a day.

Jagster · August 18, 2024, 7:35am

Totally normal. For me I got the best results using together blocking the worst user agents and geo-blogging (mine isn’t global forum, so I can easily do it).

Architect · August 18, 2024, 10:26am

Do you mean geographical-banning of I.P.s from other countries besides Finland? That seems like good idea for local-focused sites.

Jagster · August 18, 2024, 10:39am

Yeah. Right now I would get a lot of traffic from Russia, Singapore and China. Earlier it was India, Pakistan, Egypt, Iran and Irak. And I bet they can’t Finnish It is possible with Russia, though, but… no.

The biggest three are USA, France and Netherlands, and Germany is growing. But that is because of data centers and that’s why I can’t ban those.

But again, with Discourse those are mainly just annoying. With WordPress (and other LAMP stacks I would say) those create so big load that the situation starts to be closer DDoS.

And the most are by stupid script kiddies that try to knock Discourse using ancient WordPress issues.

But nowadays SEO and AI bots has started be a real questionmark.

But if one has a local forum then geo banning is just wise move.

agemo · June 21, 2025, 9:36pm

This may be picking up a problematic pace.

Have seen what I suspect is Ai enabled Bot traffic that was closing in on DDOS level disruption as the discourse service was starting to complain.

Not a highly powered setup but for expected normal demand there is some headroom, normally.

This time it showed up as huge anonymous traffic and other.

This mapped perfectly to the increased server CPU, Load and Disk I/O stats.

As a user here I got a lot of flak and many (temp) bans for de-crying the wildly enthusiatic adoption of Ai, which is now well and truly coming back to bite in so many ways (like job losses, but and now this, which may be a continuum of the OP and is nothing but the latest AI enabled web bot traffic making itself known, oh boy.

Back then my view was it was (also) the time to be thinking about all strategies to mitigate for the customer/end user, not simply joining the arms race as a sub partner, that Musk style of logic is if you can’t beat ‘em join is in this instance, easy to say, but not the correct option and the call for regulation nieve.

Stand back?

Maybe too late now.

The AI traffic may come in more human-like: technically I do not know how that works (but i know how we got here) other than it probably passes itself off as human traffic more easily and presents a more un-detectable traffic that also looks desirable from google point of view, but oh dear, this may be a bigger new problem.

Nothing is ever FREE I dunno how many (again) got so blindsided by this and did not apply human level caution and choose a stand back option.

Right now that traffic still comes in from very specific regions and even ASN block are enough to surgically take out the heat.

For how long?

ryan_olsen · June 21, 2025, 10:48pm

This is pretty normal, I run. A bunch of sites and CloudFlare usually shows about 10x to 30x my real traffic. If they don’t trigger analytics, they are bots or search engine crawlers. As most bots will not run the Javascript used for analytics.

ryan_olsen · June 21, 2025, 10:50pm

CloudFlare is free

agemo · June 21, 2025, 10:51pm

These appeared in Google analytics. That was what was different iirc.

ryan_olsen · June 21, 2025, 10:53pm

If you’re really worried get CloudFlare and firewall the offending countries. If your ip was already on the DNS. Get a new IP address. That is if you are being attacked.

agemo · June 21, 2025, 11:01pm

Indeed, server was already on the CF DNS but not proxied as I still thought that did not work from old setup advice. You know the fear of the orange cloud is strong.

However, tried it out during one of the waves and mitigated the volume relatively easy after some watching. It does seem to have stripped out a lot more traffic besides.

Is the only way to get a new Ip address to move to a new server?

ryan_olsen · June 21, 2025, 11:08pm

Depends on your hosting service. Some like digital ocean can just assign a new static IP address in the dashboard, some. You may need to ask them. I never turn it off. If I turn the orange cloud off, I consider that IP compromised. If you loose traffic from turning it on, your ssl setting is likely not set right. Or caching isn’t right. Doing live swaps to CloudFlare can be tricky if you already don’t have the SSL dialed in. As its hard to get an uncached IP address from the DNS to test with.

Topic		Replies	Views
Pageviews from Anonymous Users have exploded but Google Analytics showed no traffic growth. How to find about where the increase come from? Data & reporting	23	2325	January 5, 2021
Sudden drop in traffic Community	40	4052	December 15, 2022
freeCodeCamp.org Discourse is Collapsing from Spammer Scripts Installation	16	5033	July 3, 2020
Traffic Dashboard Stats Feature feedback	31	8910	May 16, 2016
View IP address of guests / anonymous visitors? Data & reporting	13	1331	January 13, 2022

Anonymous views suddenly very high

Related topics