I find it hard to believe that someone will have gone through threads spanning over a year has actually read all the posts - read time is 2 hours.
Any opinions on this? Think the member has visited all threads then just quickly scrolled through them? Are there any protections against this kind of thing in DC (should there be?)
I would suggest reaching out to the user via PM, introduce yourself, see how they be doinâ and gauge their interest. Some community managers do this as part of their remuneration strategy (ie, give a user a reason to stay and become a member).
The responses (or lack despite the user logging in and reading still) may help you gauge exactly what is going on behind those numbers.
Itâs probably possible to tweak how those read counts are counted to use the post_timings table instead, similar to how âtopics viewedâ are calculated in this query (as opposed to posts_read, which suffers from the same issue, currently):
tv as (
select user_id,
count(distinct(topic_id)) as topics_viewed
from topic_views, t
where viewed_at > t.start
and viewed_at < t.end
group by user_id
),
(In addition a minimum on the time viewed to count as âreadâ could be factored in).
I realize this probably isnât something that youâd want to âjust doâ since there are likely performance considerations to consider. Just thinking aloud about possible ways out in the future.
It should be counting how many of those posts qualify for counting. If topic is not private message, only count regular posts (not whispers, etc.) between post numbers before_last_read and post_number, etc. And donât count more than the count of PostTiming records the user has in the topic. Iâll give it a try.
We had the data when storing post timings, but werenât using it. There are still a few cases where the count wonât be 100% accurate, but itâs so much better than counting every post in a topic as read.
As for repairing existing stats⌠Maybe we donât?
On repairing stats⌠looking at topics entered vs posts read on a popular Discourse, I see:
I think it might be safe to only cap users with an average posts read per topic of greater than 50 as those are very very likely to be erroneous outliers. And ignore all other users who are mostly in range.
The only time that might get weird is someone who only entered 2 giant topics but actually read every post in them and this seems⌠unlikely.
The only way that I can think of to get a more accurate read count is to only count posts that have been displayed within the y-axis boundaries. (and even then, displayed does not necessarily equate to read).
However, tracking y-axis coordinates would be very expensive and IMHO the cost wouldnât be worth any benefit it might give in terms of being more accurate. Or does the âblue read dotâ already do this?
Similar to the âtime taken to postâ, factoring in a âtime taken to readâ would not be 100% perfect, but as long as it has a minimum high enough to account for a valid âskimâ it would be an improvement and a fair compromise.