Topic_views vs Topic.views


(Richard - DiscourseHosting.com) #1

When helping a client with a data explorer query, they noticed a difference between the topic views returned by the query, and the number on the category topic list.

Where does the difference between

SELECT COUNT(*) FROM topic_views WHERE topic_id = X

and

Topic.find(X).views

come from?


(Vinoth Kannan) #2

Topic.views is integer count which includes both user and anonymous traffic.

SELECT COUNT(*) FROM topic_views WHERE topic_id = X is used to track topic views by registered users. It will have below details.


(Richard - DiscourseHosting.com) #3

No, that is not entirely true. topic_views registers anonymous views as well, in those cases user_id is null.

So what’s I think that’s left is that Topic.views counts for anonymous users are counted once per topic_view_duration_hours (default: 8) hours per IP, and topic_views for anonymous users are counted once per IP?


(Vinoth Kannan) #4

Yes I can see now. As per the index uniq_ip_or_user_id_topic_views either user id or ip address is required.


(Richard - DiscourseHosting.com) #5

So what is the intended use of topic_views ?

For registering statistics about topic views it would make sense if it used the same rules as Topic.views.

For registering which user saw what topic it wouldn’t make sense to register IP addresses for anonymous visitors (which is a left over GDPR toxic item anyway)

EDIT: it seems like TopicViewItem is only being used for registering which user/IP saw what topic, and that is only used in TL3 requirement calculations. So that would mean that we could (optionally) get rid of registering IP addresses for anonymous visitors and be a bit more GDPR compliant? Or am I going too quickly now?


(Vinoth Kannan) #6

In a 20 users private community a topic can have 100 views count. Since new topic visits after topic_view_duration_hours are calculated as new view count. But select count(*) from topic_views where topic_id = X will never exceed the count 20. That’s why Topic.views count is different. This is applicable for open communities too.

I’m not sure here. I will check it on Monday.


(Sam Saffron) #7

Sounds about right.

Long term I don’t even see much value in carrying topic views in the raw form in the topic_views table for longer than a few months.


(Richard - DiscourseHosting.com) #8

Did you check already?


(Vinoth Kannan) #9

It’s already discussed in above topics.


(Richard - DiscourseHosting.com) #10

It’s discussed there, but there is no solution. IP addresses for anonymous visitors are still being stored.

So I guess we’re back at my former statement:


(Jeff Atwood) #11

I don’t think we have quite gotten around to it, but it is definitely on our list. If you would like it to be done faster, have you considered contributing a PR to get it done?


(Richard - DiscourseHosting.com) #12

Absolutely, but first I wanted to have some kind of confirmation that changing this would be actually welcome.

To sum it all up, I think we could get rid of the ip_address column in the topic_views table altogether, and stop creating entries for anonymous users, right?


(Sam Saffron) #13

No we need to roll this up into a different table, I want to retain 1 day fidelity on page views