Counting Pageviews for users (non-staff, non-crawler)

Hey guys,

I don’t see this data in the reports. I’m using the Data Explorer plugin, what query would give me the Pageview counts for only our users per week, excluding the crawler activity and the staff members of our forum, but including the anonymous pageviews.

  • Would something like this work?
  • Would it consider anonymous views?
  • Also, is there a way to convert this into a report?
-- [params]
-- string :date = '2020-01-01'

WITH t AS (
    SELECT date(:date) AS START
),
f AS (
    SELECT id as user_id
    FROM users u
    WHERE u.moderator = 'True' or u.admin = 'True'
)

    SELECT
    date_part('year', viewed_at) as year,
    date_part('week', viewed_at) as week,
    COUNT(topic_id) AS topics_viewed
    FROM topic_views, t
    WHERE viewed_at > t.START
        AND user_id not in (select f.user_id from f)
    GROUP BY year, week
    ORDER BY year, week
2 Likes

The query looks like it should work.

Yes, as it is written it is counting anonymous views. For anonymous topic views, the user_id will be NULL and the ip_address will be NOT NULL. You could exclude anonymous views by adding the following condition to the query:

AND user_id IS NOT NULL

That could be done through a plugin. The Discourse Solved plugin adds a report to the admin dashboard. There could be some code in that plugin that’s useful as an example of how to add a report: https://github.com/discourse/discourse-solved/blob/master/plugin.rb#L320.

3 Likes

So, something like this? So, would the IP address be null for crawlers?

I want to include:

  • anonymous viewers
  • logged in users

I want to exclude:

  • crawlers
  • staff users
-- [params]
-- string :date = '2020-01-01'
-- boolean :staff = false

WITH t AS (
    SELECT date(:date) AS START
),
f AS (
    SELECT id as user_id
    FROM users u
    WHERE u.moderator = 'True' or u.admin = 'True'
)

    SELECT
    user_id,
--    date_part('year', viewed_at) as year,
--    date_part('week', viewed_at) as week,
    COUNT(topic_id) AS topics_viewed
    FROM topic_views, t
    WHERE viewed_at > t.START
    AND ((ip_address is not null AND user_id is NULL) OR
        user_id not in (select f.user_id from f))
    GROUP BY user_id
    ORDER BY topics_viewed DESC
    -- GROUP BY year, week
    -- ORDER BY year, week

The above is wrong. It is undercounting anonymous users when compared to the Discourse report’s consolidated pageviews.

Looks like I might need to consider some other tables too?

1 Like

The Page Views report queries the application_requests table. That table doesn’t give any data for filtering out staff views though. What it returns is a sum of the number of requests of each type per day. The req_type is returned as an integer. The integers are mapped to the type of request with the following rules:

"http_total"=>0,
"http_2xx"=>1,
"http_background"=>2,
"http_3xx"=>3,
"http_4xx"=>4,
"http_5xx"=>5,
"page_view_crawler"=>6,
"page_view_logged_in"=>7,
"page_view_anon"=>8,
"page_view_logged_in_mobile"=>9,
"page_view_anon_mobile"=>10

The topic_views table that your query is using returns a maximum of one entry per user or IP address per day per topic. This data can’t be used for tracking page views - there could be multiple application requests for a single topic view.

4 Likes

I see. So, essentially excluding staff users is not possible to get accurate PV numbers.

Taking a long shot here, is it possible to exclude certain categories (I’m guessing not)? We have some private categories which are team internal only – and don’t want those polluting our user PV numbers.

I am not seeing any way to exclude staff from the data that is provided by the application_requests table. It is also not possible to filter the results of that table by category. If you are trying to get data about your site’s usage, the topic_views table that you’re using in your original query will give you accurate data about how your site is being used, but that data will not match what you are seeing on the Page Views report.

4 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.