How is word usage calculated? From what Iām seeing from our users, it seems to include
topic & category titles for each post even though the topic wasnāt created by the user. A couple users have āshenanigansā, āDiceā, and āMongererā as their top 5. These are categories or threads that have tons of posts but the words are really used that much in the content of the threads or elsewhere.
emoji titles - User adds to all their posts. Musical and Keyboard were in their top 5 words.
We use our search data to find a userās posts, and that data ends up with the title and category added⦠and the emoji likely gets processed from :musical_keyboard: (its markdown reference) to āmusicalā and ākeyboard.ā
I think weād need to do some additional processing or use a different source for the post data to avoid these⦠the category case is probably more likely to happen on sites where people make many short posts (or image only posts) in the same category, because in that case the category would appear very often relative to other post content.
Yes, something is very wrong with these word frequency results. For me, āusefulā is one of the top 5 unusual words. But it seems I never used this word: I searched, got lots of āresultsā the top three of which donāt even contain the word, and the discobot sidebar notes:
It seems there is no direct result matching ā@Ed_S usefulā in the search.
Is there some over-aggressive stemming or fuzzy matching going on?