Consolidated Pageviews accuracy

Not sure if this really relates to the consolidated view or a more basic issue, but I’m seeing the stats claim a lot of “anonymous user” page views on my site, which ONLY permits access by logged-in users. Other than pages like privacy policy, TOS, etc. which I’d like to block from non-logged in users (but can’t in Discourse currently), no other pages are accessible at all unless you’re logged in. The crawler count is similarly suspect, but with a much smaller actual count.

It’s not related to the new report which is using the same data you can find in other reports for a long time.

Concerning other questions:

What’s your basis to consider it suspect?

Even if only logged in, as you said some pages are still accessible as you said, especially /login. Can’t tell you exactly what’s going on with your instance with no data, but this is what I see on a similar instance:

1 Like

I was just wondering about a similar thing. I just updated to beta 9 and saw the following:

It’s telling me I have way more logged in users than even exist on my forum. Where’s this data gathered from?

2 Likes

That is Logged in users PAGEVIEWS, as one user can have hundreds of pageviews in one day.

8 Likes

Cheers. 🤦 That’s what I get for not reading carefully enough. Stupid mistake on my part.

Apparently that text is not that intuitive.

Would

from Crawlers
from Logged in users
from Anonymous cowards

look simpler to understand @10ColorsTenkara and @L30110 ?

2 Likes

Why, I never! How dare you sir! :wink:

2 Likes

After a lot of years, that title still makes me giggle a little :blush:

1 Like

Yes, I like that. Good suggestion.

I’ve a similar query. I’m seeing a consistent 2000ish crawler hits per day. Is this all the login page? I likewise allow access only to logged in users:

Very likely yes. Run the crawler report in Admin, Reports to see details: /admin/reports/web_crawlers

3 Likes

Mystery solved. The ops team put an uptime checker on the site:

LogicMonitor SiteMonitor/1.0: 50339

2 Likes

On a login-required site that I manage I’m getting ~80K crawlers per day.

I noticed that sometime in late September traffic jumped significantly and have wondered what the explanation was

I suppose I should go look at the NGINX logs and see if I can figure out what the heck is hitting the site 80K times a day.

1 Like

Did you look at the report jeff linked 2 posts before yours ?

Oh. Golly. No, I had not noticed. :astonished:

Oh! It’s not uptime robot (that’d be just 24 * 60 = 1440), but Haproxy doing a http-keep-alive. But

24 hours * 60 minutes * 2 instances * 2/minute is only 5760, which is a far cry from 80K. The Web Crawler page for yesterday and today shows 140K requests from -.

OK. (* 24 60 60) is 86400, so that’s what’s happening.

I don’t see how haproxy is making that many requests, but maybe that’s because I don’t know what I’m doing.

Is there a way to filter those? E.g., not count requests from a certain IP?

We allow blacklisting through user agent: /admin/site_settings/category/all_results?filter=crawler

2 Likes

Ah! That’s a start, but it seems that neither haproxy nor uptime robot is sending a User Agent (?). And if I were to blacklist them, they’d not be able to access, right?

I want haproxy and uptime robot to be able to see that the site is up without counting their checks as page views.

Everyone should be sending a user-agent… so if they aren’t, that’s not just a bug but kinda designing for evil.

4 Likes

For me it shows 227 logged in users, but I have fewer than 100 registered on the website. Am I interpreting it in a wrong way?

1 Like

This is what it is. Not the number of logged in users that day.

9 Likes