What does posts.word_count column mean in the database?

I need word count per topic for a badge query.
I found the word_count column in the posts table, but it seems not what I need.

For instance, this raw text:

так тут і питання, що якщо без вакцин ніяк, то обійдусь без кролів. Але є в деяких людей позитивний досвід, 3 роки без хвороб і вакцин один з них. Тримають в ямі.

Has a word_count = 1.

So, it is either a bug in how word counting works, or the column is simply supposed to do something else.

Please advise.

2 Likes

AFAIK word_count is the site setting used for the “too short” modal, not a count of the words in a post.

I doubt that data exists. Not that it couldn’t get got, but I fear it would be a massive resource hog to do retro.

Not sure how a site setting is related to a column in the posts table.

word_count” is computed with “raw.scan(/\w+/).size” which unfortunately doesn’t work properly on non-latin sentences… :frowning:

I’m thinking of replacing it with “raw.scan(/[[:word:]]+/).size” which returns 32 for your example sentence. Is that correct?

["так", "тут", "і", "питання", "що", "якщо", "без", "вакцин", "ніяк", "то", "обійдусь", "без", "кролів", "Але", "є", "в", "деяких", "людей", "позитивний", "досвід", "3", "роки", "без", "хвороб", "і", "вакцин", "один", "з", "них", "Тримають", "в", "ямі"]

EDIT: FYI this value is used to compute the “read_time” of the topic :wink:

6 Likes

Your fixed example looks good!

1 Like

Here’s the fix :sunflower:

https://github.com/discourse/discourse/commit/cf4cb2126a026f8240e5d0bc2ea97e8ebb206b81

6 Likes