Topic_views vs Topic.views

When helping a client with a data explorer query, they noticed a difference between the topic views returned by the query, and the number on the category topic list.

Where does the difference between

SELECT COUNT(*) FROM topic_views WHERE topic_id = X

and

Topic.find(X).views

come from?

4 Likes

Topic.views is integer count which includes both user and anonymous traffic.

SELECT COUNT(*) FROM topic_views WHERE topic_id = X is used to track topic views by registered users. It will have below details.

https://github.com/discourse/discourse/blob/cb824a6b338b32216baef9021fe508a290e99bbe/app/models/topic_view_item.rb#L59-L62

10 Likes

No, that is not entirely true. topic_views registers anonymous views as well, in those cases user_id is null.

So what’s I think that’s left is that Topic.views counts for anonymous users are counted once per topic_view_duration_hours (default: 8) hours per IP, and topic_views for anonymous users are counted once per IP?

4 Likes

Yes I can see now. As per the index uniq_ip_or_user_id_topic_views either user id or ip address is required.

2 Likes

So what is the intended use of topic_views ?

For registering statistics about topic views it would make sense if it used the same rules as Topic.views.

For registering which user saw what topic it wouldn’t make sense to register IP addresses for anonymous visitors (which is a left over GDPR toxic item anyway)

EDIT: it seems like TopicViewItem is only being used for registering which user/IP saw what topic, and that is only used in TL3 requirement calculations. So that would mean that we could (optionally) get rid of registering IP addresses for anonymous visitors and be a bit more GDPR compliant? Or am I going too quickly now?

3 Likes

In a 20 users private community a topic can have 100 views count. Since new topic visits after topic_view_duration_hours are calculated as new view count. But select count(*) from topic_views where topic_id = X will never exceed the count 20. That’s why Topic.views count is different. This is applicable for open communities too.

I’m not sure here. I will check it on Monday.

4 Likes

Sounds about right.

Long term I don’t even see much value in carrying topic views in the raw form in the topic_views table for longer than a few months.

4 Likes

Did you check already?

It’s already discussed in above topics.

3 Likes

It’s discussed there, but there is no solution. IP addresses for anonymous visitors are still being stored.

So I guess we’re back at my former statement:

I don’t think we have quite gotten around to it, but it is definitely on our list. If you would like it to be done faster, have you considered contributing a PR to get it done?

5 Likes

Absolutely, but first I wanted to have some kind of confirmation that changing this would be actually welcome.

To sum it all up, I think we could get rid of the ip_address column in the topic_views table altogether, and stop creating entries for anonymous users, right?

3 Likes

No we need to roll this up into a different table, I want to retain 1 day fidelity on page views

4 Likes

I’m running into this too, but on a pretty hefty scale.

The topic itself shows image.

Was it confirmed that

The only reason it’s grinding my gears is that the Topic.views is static-- You can’t do an analysis of how many views there were on this day last year, because it doesn’t record each view-- just adds to the running count.

Is there some way using another table to get a full list of views on a per-record basis?

2 Likes

When a member views a topic a row is INSERTed (or UPDATEd) into the topic_user table
https://github.com/discourse/discourse/blob/master/app/models/topic_user.rb

#  user_id                  :integer          not null
#  topic_id                 :integer          not null
#  posted                   :boolean          default(FALSE), not null
#  last_read_post_number    :integer
#  highest_seen_post_number :integer
#  last_visited_at          :datetime
#  first_visited_at         :datetime
#  notification_level       :integer          default(1), not null
#  notifications_changed_at :datetime
#  notifications_reason_id  :integer
#  total_msecs_viewed       :integer          default(0), not null
#  cleared_pinned_at        :datetime
#  id                       :integer          not null, primary key
#  last_emailed_post_number :integer
#  liked                    :boolean          default(FALSE)
#  bookmarked               :boolean          default(FALSE)

I have a feeling it would cause a resource hit and not be a good choice as a frequently run task, but you should be able to use a count of the datetimes to get something more granular than a running total.

I’d think that using topic_user would return the same count of views as the topic_views table would (which has full records) since that’s the one that stores user data-- what I’m missing is the record-level data for anonymous viewers with the same IP, but honestly, just knowing what the discrepancy there was is pretty much good enough :slightly_smiling_face:

2 Likes