The top page in Discourse is intended for two cases
-
It is where new users are directed after they join the site, and where existing users are directed after long absences from the site. The idea is that we show people the “greatest hits” of the community when they arrive, whether it is the last year, or the last month.
-
When you want to catch up with a busy Discourse and skim just the most interesting / useful / active topics. Sort of like the summarize topic option, but at the topic list level. You can view top globally or at the category level.
You can try it out yourself at /top.
One thing Sam noticed is that the existing formula for inclusion on the top page…
log of views + (2 * post count) + (2 * like count)
… does not give the first post any priority, but probably should. So he changed it to
(log of views * 2) + first post likes + (least(likes/posts, 3) * 4)
This is better, but I am not sure it’s quite right.
I wanted to brainstorm a bit on the concept here. What is a “top” discussion, exactly? What criteria would make you pick a particular topic for inclusion in the top topics for the last week, month or year?
When I thought about this and browsed the existing /top list at topics, here are the qualities they had:
-
Not too long. Very long (100, 200, 1000+ post) topics are kind of overwhelming and resemble chat more than a discussion.
-
Not too short. A topic with 0 replies is definitely not an interesting discussion. It’s more of a notice. A topic with 1 or 2 replies is very unlikely to be an interesting discussion. And even a topic with 10 posts may not be a particularly interesting discussion, depending on how many people are participating in the topic.
-
Likes. These matter a lot, obviously, but how the likes are distributed is important. A topic with a first post that has zero likes but one reply that has 30 likes is still significant. Or a topic with 50 replies, and 3 of those replies have 10 likes. It seems to me that the “outlier” nature of some of the like density is what is important here. Just having 1 like on 30 replies is not interesting. But 30 likes on a single reply? That’s massive.
-
First post likes. If the “head” of the topic is great, the discussion is likely to be great too. The first post sets the tone for all future replies.
-
Views. If we only weight likes, we get some inside baseball, bikesheddy topics dominating the list. We need to measure objective, outside world interest in our topics just as heavily as internal interest. To be blunt, what we think matters, and what the outside world decides matters, can be two very different things. Consider the new user seeing the /top page after account creation – they want topics that are of legitimate interest to everyone not just our insular little like-brigade.
-
Topic length is problematic in that the larger the topic gets, the more replies there are to gather like votes. 200 replies with 1 like vote each is objectively a lot of like votes, but none of those posts are individually very interesting.
Based on the above, I think we should try:
(log of views * 2) + first post likes + (reply likes / ceiling(post count / 5))
So (and I probably should graph this):
- What is the value of an anonymous like? (1.3 votes at 2 views, 4.6 votes at 10 views, 9 votes at 100 views, 14 votes at 1k views, 18 votes at 10k views, 23 votes at 100k views)
- What is the value of a first post like? (1x)
- What is the value of a reply like? (1x, declining rapidly at intervals of 5 posts)
This optimizes for the “sweet spot” of topics at around 5 - 40 posts. It gives first post a boost and lets anon votes contribute. (Although as like counts get to 100+, the effect of views can be slim.)
I think an even better approach would be to discard like counts for all replies and topics that are less than the mean or median for all topics in the specified interval… but I don’t think we can do that in SQL efficiently.