Improving the "Top" criteria

The top page in Discourse is intended for two cases

  1. It is where new users are directed after they join the site, and where existing users are directed after long absences from the site. The idea is that we show people the “greatest hits” of the community when they arrive, whether it is the last year, or the last month.

  2. When you want to catch up with a busy Discourse and skim just the most interesting / useful / active topics. Sort of like the summarize topic option, but at the topic list level. You can view top globally or at the category level.

You can try it out yourself at /top.

One thing Sam noticed is that the existing formula for inclusion on the top page…

log of views + (2 * post count) + (2 * like count)

… does not give the first post any priority, but probably should. So he changed it to

(log of views * 2) + first post likes + (least(likes/posts, 3) * 4)

This is better, but I am not sure it’s quite right.

I wanted to brainstorm a bit on the concept here. What is a “top” discussion, exactly? What criteria would make you pick a particular topic for inclusion in the top topics for the last week, month or year?

When I thought about this and browsed the existing /top list at topics, here are the qualities they had:

  • Not too long. Very long (100, 200, 1000+ post) topics are kind of overwhelming and resemble chat more than a discussion.

  • Not too short. A topic with 0 replies is definitely not an interesting discussion. It’s more of a notice. A topic with 1 or 2 replies is very unlikely to be an interesting discussion. And even a topic with 10 posts may not be a particularly interesting discussion, depending on how many people are participating in the topic.

  • Likes. These matter a lot, obviously, but how the likes are distributed is important. A topic with a first post that has zero likes but one reply that has 30 likes is still significant. Or a topic with 50 replies, and 3 of those replies have 10 likes. It seems to me that the “outlier” nature of some of the like density is what is important here. Just having 1 like on 30 replies is not interesting. But 30 likes on a single reply? That’s massive.

  • First post likes. If the “head” of the topic is great, the discussion is likely to be great too. The first post sets the tone for all future replies.

  • Views. If we only weight likes, we get some inside baseball, bikesheddy topics dominating the list. We need to measure objective, outside world interest in our topics just as heavily as internal interest. To be blunt, what we think matters, and what the outside world decides matters, can be two very different things. Consider the new user seeing the /top page after account creation – they want topics that are of legitimate interest to everyone not just our insular little like-brigade.

  • Topic length is problematic in that the larger the topic gets, the more replies there are to gather like votes. 200 replies with 1 like vote each is objectively a lot of like votes, but none of those posts are individually very interesting.

Based on the above, I think we should try:

(log of views * 2) + first post likes + (reply likes / ceiling(post count / 5))

So (and I probably should graph this):

  1. What is the value of an anonymous like? (1.3 votes at 2 views, 4.6 votes at 10 views, 9 votes at 100 views, 14 votes at 1k views, 18 votes at 10k views, 23 votes at 100k views)
  2. What is the value of a first post like? (1x)
  3. What is the value of a reply like? (1x, declining rapidly at intervals of 5 posts)

This optimizes for the “sweet spot” of topics at around 5 - 40 posts. It gives first post a boost and lets anon votes contribute. (Although as like counts get to 100+, the effect of views can be slim.)

I think an even better approach would be to discard like counts for all replies and topics that are less than the mean or median for all topics in the specified interval… but I don’t think we can do that in SQL efficiently.

15 Likes

Algos are interesting but I almost always have trouble wrapping my head around them.

One thought is that maybe some kind of post_count to to number_of_different_members ratio might be useful as an indicator of appeal.

4 Likes

First off, sorry for the really late reply. It is only up to now that I’ve had some real interest in the Top page, as we may be shifting/trying it out as our homepage soon to see if it increases activity (and your recent post reminded me of this topic).

So I like the direction where this is going. Especially for our audience, where the initial discussion post is typically a question with the original poster not certain as to what direction is best to go. “Large applications without ORM?”, where the first response may get a ton of likes because it provided a good point for the user to consider, but the actual question itself may generate zero likes as it was just a user who was uncertain about the best approach to take.

I also like this idea. I think having several contributors in the discussion is meaningful. That means you have more than just 3 people chatting back and forth, so you get a wider set of opinions/statements from a variety of backgrounds.

Currently, I don’t hold a lot of weight to the first post in a topic, that is the starter discussion, I would have thought most of the likes would occur in reply 2-19 and maybe a few less frequent posts beyond post 20 (I don’t have any metrics to support that though).

2 Likes

I feel there is no one “right” value for top and most of these variables should in fact be be site setting variables where the site owner can change the weight assigned to each one… I know @charleswalter has asked for this as well.

3 Likes

BTW, I tried setting the value of likes for staff to be worth like 50000x but still I couldn’t make good topics make it to the top of the Top!

The topics we are currently seeing at the top of the top are not that relevant to the rest of the community, and I am contemplating turning off the emails, unfortunately. This feature really has a lot of potential to get users coming back to the community if we can push more relevant content.

If there are ideas on things we could do to improve the experience, would love to hear.

2 Likes

It would also be neat if we could increase the weight of likes for moderators, who are not necessarily admins. I couldn’t figure out if there was a way to do this.

1 Like

To clarify: is this one of the topics you want to remove from Top?

If so, then it sounds like you will need explicit manual curation. The like counts on every post are very high.

http://www.helloforos.com/t/guerra-de-besos/27555

1 Like

I think the definition of top posts depends on what your forum does.
I am not sure a good post should not be very long. Perhaps, for this forum, it’s right, but for a literal forum, I do not think this is true.
So my opinion, we should give the site administrators the possibility to define their own formula.
Perhaps only a human could select the best posts. Add the possibility to allow the site editors to select the top posts.

1 Like

@riking I don’t mind if that content makes it somewhere on the top, I just want to be able to have more control over what makes it to the very top, and then ultimately into an email to a user. When we’re sending emails to a few thousand users, we need to make sure that the content is going to be more appealing.

What you may want to do in the meantime is disable digests, then export your user list as csv every week and send a manually hand curated email instead.

Okay, this is now done via

https://github.com/discourse/discourse/pull/3968

3 Likes

So good!
Will there be individual weights added to categories eventually?

Good idea, that could be metadata attached to the category for sure. We used to actually have a “hotness” you could set on a category from 1-10 and maybe we should bring that back.

1 Like

I suspect a 1-10 “hotness” setting would feel alien to most community managers though. It ought to be simpler than that, e.g. “Important” (prioritised) vs “Not important” (de-prioritised). Pretty sure I’ve seen some torrent clients letting users weigh their active downloads this way.

Don’t see how limiting scale range is good.

You should name it weight (or multiplier), just like other parameters. It should be 1 by default, then any real number should work: 1.05, 0.7 and so on. Let the community admins decide the scale.

For instance, I have 3 categories that are really not important, I’d assign a factor of 0.5; then there are standard ones (1), and a few very important ones (1.3). A simple case but 3 various multipliers already.

2 states - too few.


If people don’t understand multipliers, make the input “per cent” from 1-1000%

1 Like

Since we don’t really have an answer for this, and I tend to agree that

… which was implemented by @techapj in 1.5, going to close this for now.

4 Likes

A post was split to a new topic: Redefining “Top” scores