Redefining "Top" scores

Not sure about the interest in this, but I had a few suggestions for tweaking the Top algorithm in order to increase engagement and provide a deeper discussion/experience for the user.

Give weight to (all adjustable in settings so as to be customizable for each forum):

Actual content of Topic:

  • Total Views & Views of Original Post (Lower rating if total views are significantly lower than Original Post views)
  • Original Post likes
  • Total likes (We need to measure these by number of replies - accordingly give higher or lower rating)
  • Total number of Replies (less than 3 get a negative rating)
  • Number of posts with 2 or more replies (higher number gives higher rating)
  • (Likes + Replies + Views) by time
  • Exceptional number of likes for any post (outlier post might be interesting)
  • Subscriptions to Topic (watched, tracked or muted), subscriptions in the past hour
  • Median Reading Time compared to estimated Reading Time, adjusted for number of posts
  • Analysis of median Reading time (decline or uptake over past 24 hours, past 12 hours, past 3 hours, past hour)
  • Number of users that have replied in topic, adjusted for number of posts
  • Number of users online in topic
  • Number of posts bookmarked by users TL2 or higher, adjusted for number of posts
  • Writing time for posts within topic, especially for Original Post
  • Number of tags (1 or more, but not more than 3 get higher rating)
  • Number of posts viewed in topic before closing on average, adjusted for number of posts
  • Questions without solutions (higher weighting for certain time)
  • Allowance for bumping old topics, in case they disappeared unfairly
  • Original Poster reputation (level, badges, solutions, blocks, flags, likes)
  • Referral traffic (higher score from within forum, lower score for outside)

Since we are measuring these for a specific time period, we would need to measure median metrics for each month to account for user growth and weight the above against monthly numbers. Also, maybe we could exclude TL0 views/likes/etc. to protect against algorithm gaming.

If we wanted to go the extra mile and personalize the feed for each user we could also give weight to:

  • User Interests (tracking/watching categories/subcategories and tags)
  • Visits by User (categories/subcategories & tags of topics ordered by times visited, negative rating for ignored categories in feed)
  • Time since Topic posted from last online (slight priority for recent posts, within the selected time period - today, week, month, quarter, year)
  • Topics posted by top Users you have liked, and topics with original posts liked by top 5 people you have liked
  • Potential User Interests (categories/subcategories not frequented and watched/tracked by user) - lower rating if user continues to ignore
  • Show content based on analysis of last 30 interactions of User

If we wanted to go full crazy without getting into Machine Learning, we could also do a discussion analysis. This could then also be used when summarizing a topic to expand the better posts.

I have borrowed the following analysis text from an LMS I worked with - we can draft a similar algorithm for analyzing discussions:

Substantive posts

Substantive posts is the number of responses or replies that contribute to the discussion’s development. A substantive post contains sentences that establish or support a user’s position or ask thoughtful questions. These posts also show critical thinking or sophisticated composition, based on word choice and variety.

Non-substantive posts may be short or underdeveloped. A user should expand on their post and explain a position to make the response or reply substantial.

Sentence complexity

Sentence complexity is measured by the number of sentences, words, and syllables in each response. We look at the complexity of words and how often the words are used. This measurement is a linguistic standard called Flesch-Kincaid. The complexity of each user’s total posts is represented by a grade level from 1st grade to 16th grade. Content with a Flesch-Kincaid grade level of 10 should be easily understood by a person in 10th grade.

Lexical variation

Lexical variation analyzes the substance of a user’s responses or replies based on the words they’ve used.

Content words carry meaning in a user’s response or reply. These words show a user’s feelings or thoughts regarding the prompt. When compared with total word count, content words help show the lexical density of a user’s responses and replies. A high count can indicate more sophisticated writing.

Functional words unite the semantic elements of a sentence together and indicate proper grammar. Prepositions, conjunctions, pronouns, and articles are functional words.

Think of functional words as the glue that holds a user’s response together. The words may not have substantial meaning themselves.

Critical thinking

Critical thinking indicates words and phrases within a user’s total posts that demonstrate critical thinking. Twelve dictionaries are used to identify the words, which then fall into one of the weighted categories of critical thinking:

Argue a position

Include supporting data

Cite literature or experience

Evaluate

Summarize

Reference data

Offer a hypothesis

How we measure critical thinking

The weighted number of the words and phrases in each category are combined and then compared to the site average to create the critical thinking score. The score is the difference between the user’s critical thinking and the site average.

The score falls in a decimal range of -1 to 1. A negative score means the user’s critical thinking is below the site average. A positive score means the user’s critical thinking is above the site average. A score close to 0 means the user’s critical thinking is at the site average level. These scores are represented by a range of low to high:

-1 < -0.06 = Low

-0.06 to -0.03 = Below Average

-0.03 to 0.03 = Average

0.03 to 0.06 = Above Average

.06 to 1 = High

Critical thinking is represented visually to show each user’s score compared to the site average.

Examples:

Empirical research shows disagreeing displays a higher level of critical thinking than agreeing. In a discussion, the statement “I agree with John” receives a score of 0.113, while “I disagree with John” receives a score of 0.260.

If users summarize a passage but add no opinion or argument, they score lower than others who argue a position.

If users cite literature, they receive a lower score than others who offer a hypothesis.

Word variation

Word variation measures the number of unique words in a user’s submission as a percentage. A higher percentage of unique words can show that the user’s composition contains multiple ideas, significantly supports a position, or engages other users to think about other perspectives.

We can compare the user’s percentage to the site average.

3 Likes

This sounds incredibly complicated. What evidence do you have that the current much simpler methods are not good enough?

3 Likes

It’s not that the current system is not good enough - it works well, but can be made better as with any system. In particular, by making personalised recommendations unique to every user. Interest, relationship and usage should be taken into account in order to display the most interesting posts to a user, as ‘interesting’ is so subjective. By refining a user feed, it could make for an even deeper discussion and engagement.

Discourse currently uses the following criteria in deciding what are the most interesting topics for a user:

  • Likes
  • Number of Posts
  • Original Post likes
  • Views

It already has a bunch of data that could better filter the most interesting posts:

Reading time:
If users are spending a lot of time reading a topic (adjusted for number of posts), it would tend to more interesting than a topic where users skim or close it after reading a few posts.

Topic with multiple replies to posts:
A topic that has a lot of replies to posts might have an interesting discussion going on.

Subscriptions to topic:
If users are tracking/watching a topic with interest, it would have more utility than usual.

Number of posts bookmarked:
If a lot of posts are getting bookmarked, the topic would tend to have some important information.

High number of likes:
If a single post has an unusual number of likes compared to the median, it would tend to have community interest.

Number of users online:
A large number of users lurking on a topic might indicate an interesting discussion.

OP Reputation:
A user with a higher trust level and reputation (badges, solutions, likes) would tend to have higher quality topics. Similarly, a user who has been flagged multiple times for example, would tend to post lower quality topics.

Referral traffic:
If a topic is getting a lot of outside referral traffic, it would have something of interest to the community.

Additionally, a user would be more interested in content that is tailored to their taste. For instance:

User interest:
If a user is tracking/watching a category/subcategory, they would be more likely to find a topic within those interesting.

Liked users:
If I consistently like a user’s posts, I tend to find their content more interesting. Showing topics started by the most liked users would tend to be of interest.

Potential user interest:
On the other hand, users could also be shown a small number of topics from categories they do not watch/track, in case that is something of potential interest.

As with the current system, admins would be able to set the weights and so customize it to their taste.

3 Likes

I find your lists very interesting IMO (I probably have no Humble opinion), I can’t see it working because it sounds, as @codinghorror says, “incredibly complicated” without a lot more refining of the focus.

I also got confused which is why I’d like to see your suggestions more clearly presented:

2 Likes