Estimating historical read time for a large forum import

I’m in the process of migrating a legacy forum with about 1.3 million posts and two decades of history to Discourse.

What I would like to avoid is for every user to show 0 read time from the start upon the successful migration, since many have 5, 10, and some 15+ years of history.

With Discourse’s emphasis on read time (which is great IMHO) it would be nice to be able to do a best efforts calculation of this stat for users who have significant history so that they don’t feel like the clock got reset on their contribution post migration. (While gamification can be a huge inspiration and motivator, it can also be a demoralizer when stats users take pride in are wiped out.)

I realize there’s no perfect way to display data that was never previously tracked, but is there a script I can run, or perhaps something that can be added to an import script that would essentially estimate a users’ historic read time based on their current post count?

Something like:

PostCount = SELECT posts_count FROM posts WHERE id = (current user id being imported)

ReadTimePerPost = 300 (seconds)

RetroactiveReadTime = PostCount * ReadTimePerPost

Would it theoretically be possible to do this and then insert that number into the database tied each user?

If so, where and how is the read time stored?

And BTW, valuing each post at 300 seconds of read time is just my my estimate. Honestly it is probably very low, all things considered, but much more accurate than 0.

1 Like

I think I’d do something like

Post.where(user_id: user.id).where(some other stuff?).each do |post|
  PostTiming.create(topic_id: post.topic_id, post_number: post.post_number, user_id: user.id, msecs: READ_TIME_MSECS
end

If I’m doing the arithmetic right, 300 seconds is 5 minutes. Do you think that it took 5 minutes to read and respond to this post?

1 Like

I haven’t taken the time to think through the timing yet, What I was thinking is that read time technically doesn’t just take into account times when a post resulted from the reading. A lot of people read a lot and spend a lot of time skimming, which wouldn’t be taken into account at all based only on the time it takes to read and post.

Granted, I’d rather underestimate than overestimate.

I’ve done scores of imports. No one has ever asked for making up bogus read times. I bet people will know that the stats start when you switched to discourse.

3 Likes

A much more valuable use of your time is probably identifying what automatic threshold you should use for granting imported users TL2, and hand-picking the people in your community who you trust to have good judgement regarding titles and categorization to start out as TL3.

TL promotion is the primary use of read time statistics that could plausibly matter for an import.

2 Likes