I never know what to think of the anonymous views numbers but it never seems to corresponds to Google analytics data in any meaningful way.
The last four or so days has brought this into focus because there has been a huge sustained increase in anonymous views which is unusual.
It may be coincidental since upgrading to 3/3.1 but could it be related as it began a little after.
While also seeing that login stats look to have dropped significantly, that’s concerning.
Generally it hard to know hit to view login stats either as they don’t correspond to analytic numbers but looking only at the dashboard these is a strange new trend.
In terms of the disparity of Google analytics and dashboard stats, here is a simple example - you see 500 Google analytics unique visits for a day but on discourse dashboard 2000 logins, 50000 anonymous view and 5000 crawkers.
What is going on overall with these stats?
How should we treat the data and what can it tell us in terms of managing a discourse forum?
Are anonymous views an indication of unsolicited farmed traffic and a waste of resources?
Such traffic is filtered from Google analytics if you select the right option - maybe it’s not on the discourse side and could indicate it be some kind of low level DDOS type traffic for whatever bizarre and spurious reasons, again wasting resources but possibly affecting genuine logins?
No report thus far of login issues.
Overall how do we interpret the dashboard numbers!
What helped me immensely were the web crawler reports in admin after being tipped me off on their existence (I had never noticed them before) using this I banned various crawlers. Which dropped anon views. I do think crawlers come in on anon views too. I have no idea how.
The reports also give you b=names of crawlers (user agents) to search up on each crawler to see if it has value.
We have a new metric in place now which splits out anon pageviews into ‘likely human’ and ‘probably bot’ so people can think of the latter more like crawlers (which they likely should be, but aren’t identifying themselves as such).
The report is part of the stock ones and can be found at /admin/reports/consolidated_page_views_browser_detection
There’s also some other work in progress to apply this to topic view metrics as well to prevent bot-bloating.
No. If coder of that bot is worked as they quite often do meaning there is some part of text in user agent, device, systen etc. that is same all the time then you can block them totally, but you need a reverse proxy. robots.txt is only guideline for good behaving bots.
Logs by Discourse are more or less just giving a blink of big picture. Such detailed data you must dig from logs of Nginx, meaning welcome console
WordPress can easily go on its knees because of bots, but with Discourse the situation is more just annoying. Content stealing is today’s norm, and has been long time now.
Reverse Proxy seems like a good first step, is cloudflare good for that?
Know a local web-developer buddy who recommended using cloudflare nameservers for security can be good.
I’m not too concerned about published content being ‘stolen,’ when text is published in public people have a right to record that as long as they aren’t trying to sell that as their own creation would then become a problem.
Had unusual spike of bot “other pageviews” yesterday August 17th of 152, very random for mostly inactive site that usually has only about 15-20 of those a day.
Totally normal. For me I got the best results using together blocking the worst user agents and geo-blogging (mine isn’t global forum, so I can easily do it).
Yeah. Right now I would get a lot of traffic from Russia, Singapore and China. Earlier it was India, Pakistan, Egypt, Iran and Irak. And I bet they can’t Finnish It is possible with Russia, though, but… no.
The biggest three are USA, France and Netherlands, and Germany is growing. But that is because of data centers and that’s why I can’t ban those.
But again, with Discourse those are mainly just annoying. With WordPress (and other LAMP stacks I would say) those create so big load that the situation starts to be closer DDoS.
And the most are by stupid script kiddies that try to knock Discourse using ancient WordPress issues.
But nowadays SEO and AI bots has started be a real questionmark.
But if one has a local forum then geo banning is just wise move.
Have seen what I suspect is Ai enabled Bot traffic that was closing in on DDOS level disruption as the discourse service was starting to complain.
Not a highly powered setup but for expected normal demand there is some headroom, normally.
This time it showed up as huge anonymous traffic and other.
This mapped perfectly to the increased server CPU, Load and Disk I/O stats.
As a user here I got a lot of flak and many (temp) bans for de-crying the wildly enthusiatic adoption of Ai, which is now well and truly coming back to bite in so many ways (like job losses, but and now this, which may be a continuum of the OP and is nothing but the latest AI enabled web bot traffic making itself known, oh boy.
Back then my view was it was (also) the time to be thinking about all strategies to mitigate for the customer/end user, not simply joining the arms race as a sub partner, that Musk style of logic is if you can’t beat ‘em join is in this instance, easy to say, but not the correct option and the call for regulation nieve.
Stand back?
Maybe too late now.
The AI traffic may come in more human-like: technically I do not know how that works (but i know how we got here) other than it probably passes itself off as human traffic more easily and presents a more un-detectable traffic that also looks desirable from google point of view, but oh dear, this may be a bigger new problem.
Nothing is ever FREE I dunno how many (again) got so blindsided by this and did not apply human level caution and choose a stand back option.
Right now that traffic still comes in from very specific regions and even ASN block are enough to surgically take out the heat.
This is pretty normal, I run. A bunch of sites and CloudFlare usually shows about 10x to 30x my real traffic. If they don’t trigger analytics, they are bots or search engine crawlers. As most bots will not run the Javascript used for analytics.
If you’re really worried get CloudFlare and firewall the offending countries. If your ip was already on the DNS. Get a new IP address. That is if you are being attacked.
Indeed, server was already on the CF DNS but not proxied as I still thought that did not work from old setup advice. You know the fear of the orange cloud is strong.
However, tried it out during one of the waves and mitigated the volume relatively easy after some watching. It does seem to have stripped out a lot more traffic besides.
Is the only way to get a new Ip address to move to a new server?
Depends on your hosting service. Some like digital ocean can just assign a new static IP address in the dashboard, some. You may need to ask them. I never turn it off. If I turn the orange cloud off, I consider that IP compromised. If you loose traffic from turning it on, your ssl setting is likely not set right. Or caching isn’t right. Doing live swaps to CloudFlare can be tricky if you already don’t have the SSL dialed in. As its hard to get an uncached IP address from the DNS to test with.