Tracking of Pageviews and Unique Users at Category and Tag Level

It’s possible to get total category views from the Data Explorer by summing up the views of all topics under a category:

SELECT 
       c.id as category_id, 
       SUM(views) as "total views"
FROM categories c 
JOIN topics t ON t.category_id = c.id
WHERE read_restricted is false
GROUP BY c.id
order by sum(views) desc

Getting total views by tag is similar:

SELECT tags.name,
       sum(views)
from topics t
     join topic_tags tt on t.id = topic_id
     join tags on tags.id = tt.tag_id
group by tags.name
order by sum(views) desc

There’s also a topic_views table which could, in theory, be used to divide views by user and date. I haven’t found that table particularly useful, however, because queries time out when topics have a lot of views. It also shows a lot fewer views per topic because, if I understand it correctly, it’s not counting anonymous views. I also think maybe topic.views includes not only anonymous views but also bot traffic? It’s a lot more than what I see on GA.

Speaking of GA, it has all the data you need, but it’s not easy to group it by category or tag. The best I can do try to parse the pageTitle to find categories. I’m doing this in R using googleAnalyticsR. Follow the instructions in the manual to authorize your Google account and get the metrics you want. Be sure to include pageTitle as a dimension. My API calls look a bit like this:

ga_this_year <- ga_data(ga_id, 
                        date_range = c("2023-01-01", "2023-05-30" ),
                        metrics=c("screenPageViews","averageSessionDuration", "sessions"), 
                        dimensions = c("pageTitle", "deviceCategory")
                       )

The key to understanding the next bit is to see that pageTitles fit this general format:

Topic title - Category - Site title

If the topic is not categorized, the Category is missing. There are also a bunch of utility pages (“Latest topics”, “New topics”, etc.) that don’t have categories. (I’m not counting “Latest [category] topics” as being part of the category, though it might be better to include them.) Finally the homepage uses Site title - short site description for the page title. We don’t care about any of those, however. So the regex I’m using is:

str_extract(pageTitle,
            str_glue(".*? - (.*) - {sitename}"),
            group=1)

Putting it together into a function that groups by category, I get something like this:

category_views <- function(data, sitename = "Meta Jon") {
  data %>% 
  mutate(category = str_extract(
                                pageTitle,
                                str_glue(".*? - (.*) - {sitename}"),
                                group=1
                                )) %>% 
  group_by(category) %>% 
  summarise(views = sum(screenPageViews)) %>% 
  arrange(desc(views)) #%>% head(20)
}

category_views(ga_this_year)

(Obviously change “Meta Jon” for your own site’s name.)

I don’t currently know how to extract GA data based on tag, however.

2 Likes