Degauss your screens, Discourse Rewind 2025 is here šŸ’¾

How is word usage calculated? From what I’m seeing from our users, it seems to include

  • topic & category titles for each post even though the topic wasn’t created by the user. A couple users have ā€œshenanigansā€, ā€œDiceā€, and ā€œMongererā€ as their top 5. These are categories or threads that have tons of posts but the words are really used that much in the content of the threads or elsewhere.
  • emoji titles - User adds :musical_keyboard: to all their posts. Musical and Keyboard were in their top 5 words.
2 Likes

We use our search data to find a user’s posts, and that data ends up with the title and category added… and the emoji likely gets processed from :musical_keyboard: (its markdown reference) to ā€œmusicalā€ and ā€œkeyboard.ā€

I think we’d need to do some additional processing or use a different source for the post data to avoid these… the category case is probably more likely to happen on sites where people make many short posts (or image only posts) in the same category, because in that case the category would appear very often relative to other post content.

1 Like

Yes, something is very wrong with these word frequency results. For me, ā€œusefulā€ is one of the top 5 unusual words. But it seems I never used this word: I searched, got lots of ā€œresultsā€ the top three of which don’t even contain the word, and the discobot sidebar notes:

It seems there is no direct result matching ā€œ@Ed_S usefulā€ in the search.

Is there some over-aggressive stemming or fuzzy matching going on?

1 Like