When helping a client with a data explorer query, they noticed a difference between the topic views returned by the query, and the number on the category topic list.
Where does the difference between
SELECT COUNT(*) FROM topic_views WHERE topic_id = X
No, that is not entirely true. topic_views registers anonymous views as well, in those cases user_id is null.
So what’s I think that’s left is that Topic.views counts for anonymous users are counted once per topic_view_duration_hours (default: 8) hours per IP, and topic_views for anonymous users are counted once per IP?
For registering statistics about topic views it would make sense if it used the same rules as Topic.views.
For registering which user saw what topic it wouldn’t make sense to register IP addresses for anonymous visitors (which is a left over GDPR toxic item anyway)
EDIT: it seems like TopicViewItem is only being used for registering which user/IP saw what topic, and that is only used in TL3 requirement calculations. So that would mean that we could (optionally) get rid of registering IP addresses for anonymous visitors and be a bit more GDPR compliant? Or am I going too quickly now?
In a 20 users private community a topic can have 100 views count. Since new topic visits after topic_view_duration_hours are calculated as new view count. But select count(*) from topic_views where topic_id = X will never exceed the count 20. That’s why Topic.views count is different. This is applicable for open communities too.
I don’t think we have quite gotten around to it, but it is definitely on our list. If you would like it to be done faster, have you considered contributing a PR to get it done?
Absolutely, but first I wanted to have some kind of confirmation that changing this would be actually welcome.
To sum it all up, I think we could get rid of the ip_address column in the topic_views table altogether, and stop creating entries for anonymous users, right?
The only reason it’s grinding my gears is that the Topic.views is static-- You can’t do an analysis of how many views there were on this day last year, because it doesn’t record each view-- just adds to the running count.
Is there some way using another table to get a full list of views on a per-record basis?
# user_id :integer not null
# topic_id :integer not null
# posted :boolean default(FALSE), not null
# last_read_post_number :integer
# highest_seen_post_number :integer
# last_visited_at :datetime
# first_visited_at :datetime
# notification_level :integer default(1), not null
# notifications_changed_at :datetime
# notifications_reason_id :integer
# total_msecs_viewed :integer default(0), not null
# cleared_pinned_at :datetime
# id :integer not null, primary key
# last_emailed_post_number :integer
# liked :boolean default(FALSE)
# bookmarked :boolean default(FALSE)
I have a feeling it would cause a resource hit and not be a good choice as a frequently run task, but you should be able to use a count of the datetimes to get something more granular than a running total.
I’d think that using topic_user would return the same count of views as the topic_views table would (which has full records) since that’s the one that stores user data-- what I’m missing is the record-level data for anonymous viewers with the same IP, but honestly, just knowing what the discrepancy there was is pretty much good enough