Discourse tracks read time for every post users see on the screen. This system has evolved over the years and I find I often need to refer back to the code to figure out how this works and why it exists.
This post covers the painful technical details of the current implementation.
How the Discourse client tracks timing
Post timing tracking is implemented in screen-track.js.es6. This module is responsible for tracking how long a post has been on the screen and how long the topic has been on screen.
When a topic page “scrolls” it will inform the screen tracker what posts it has in view AND which of these posts have been read. We consider partially in view posts as “in view”.
Screen track then fires a “tick” every second that decides what data needs to be sent to the server.
The screen tracker will keep track of multiple lists.
a. A list of (post/time spent reading post) that has not been sent to server
b. A list of posts that we know were read
c. A list of posts that we know are on the screen right now
At the start of a tick (every second) if we have posts in (a) we will consider sending them to the server:
-
If SiteSetting
flush_timing_secs
(default 60 secs) has passed since the last time we sent data to the server. -
If any of the posts are “unread” by the user we will send the entire list right away
At the end of a tick if Discourse has focus:
If we have any “posts on the screen”, we will log “1 tick” of time for each post
If at any point we leave the topic (navigate to another place in Discourse)
We will send everything that is “in flight” in (a) right away to the server.
Limits
-
Each time you look at a topic we will log a maximum of 6 minutes reading time per post (this will reset if you navigate away and back to the post)
-
If 3 minutes pass and you have not scrolled at all, we disable this subsystem until stuff scrolls again
-
We will log timing for up to 5 topics for anonymous users (which is converted when the user signs up to data in the posts_timing table)
Key observation
-
Even though the
post_timings
table track down to the millisecond we have between “0-1000ms” of “unlogged” time per post, depending on when the tick fires. -
Each “session” of looking at a topic can log up to 6 minutes of read time per post. There is no upper limit on read time per post, a post can be read for days by a user if the user returns to a topic.
What do we do with this data?
The most critical piece of information we use is “did user X read post Y”, this determines unread counts on topic and tons of other critical data.
Except for the binary use we use the time logged in post_timings
to calculate avg_time
for a post.
Average time for a post is calculated as the exponent of the average of the natural log of the time (aka geometric mean).
So for example:
Post 1: sam, 10 seconds
Post 2: jane, 1 hour
avg_time = exp((log(3600000) + log(10000)) / 2)
=~ exp((15.09 + 9.2) / 2)
=~ 189094
=~ 189 seconds
This avg_time is then used in score calculator as a component for “post score”.
Score = 5 * reply_count + 15 * like_score + 5 * incoming link count + 2 * bookmark_count + 0.05 * avg_time + 0.2 * post reads.
So in the case above 189 seconds on avg reading a post translates to 37 points. So… roughly 2 likes an a bit. Or 72 reads.
“post score” is uses then for “best of” to figure out what the best posts are in a topic.