Hi,
It seems that Discourse has some trouble dealing with Chinese characters. Our users cannot submit topics/posts if they use in chinese? In this case, I can see that it’s a long message but we still get the “Body seems unclear” message.
Any idea?
Hi,
It seems that Discourse has some trouble dealing with Chinese characters. Our users cannot submit topics/posts if they use in chinese? In this case, I can see that it’s a long message but we still get the “Body seems unclear” message.
Any idea?
I see what is happening here.
We automatically disable this on Chinese forums but your forum appears to be English with a Chinese category.
Just set body min entropy to 0 in site settings.
Hum. Correction. It seems setting the body min entropy to 0 did not fix the issue. I tried with another text in Chinese and I still get the same error even though the body min entropy is set to 0


Did i miss something?
Hi,
Following up on this issue. I’m running some test with the latest version of discourse.
Body min entropy is set to 0. Same for Title min entropy.
When trying to create a topic with the body below I get the “Body unclear” error:
【澳門日報5月29日消息】國際會議協會(ICCA)日前發佈《二○一七年國際協會會議市場年度報告》。當中澳門多項評比的排位連續兩年均有上升,其中全球城市排名由一六年的七十二名躍升至第六十五名;亞太區域城市排名升一位至第十六,排名超過瑞士的日內瓦、澳大利亞的布里斯班、阿拉伯聯合酋長國的迪拜、韓國的釜山和濟州等城市
Is there a quick work around on this? My Chinese users are getting nervous because of this issue.
Thx
Seb
I’ve clarified this issue. But newbie is only able to put single picture on a post. So just a evidence and conclusion.
Conclusion, for both title and body
Sorry for reviving this, but we have hit the same issue on our Forum which is primarily in English, but some sections in other scripts. Setting body min entropy to 0 did not fix this.
The issue seems to be that the use of some latin characters trips the all caps check. Here’s an example of a message that bumps into the Body seems unclear notice:
我看了一下,我8/15寄往俄罗斯的明信片10/13对方收到了,但是10/27寄的对方还没收到,现在已经36天了(不过同一批寄往不同国家的也没被收到)。
因为我是直接投的邮筒所以也不太清楚是不是寄不过去… 如果你在UCPC微信群里也许可以问下大家?
Is the allow uppercase posts the only solution here? On forums like ours where English is the main language, enabling that is not ideal, but I can also understand the frustration of users entering a valid message in their script bumping into that error. Could checking the ratio of CAPS versus the size of the body help here?
That is what it does and in you example the ratio is 100%.
When a forum default language is set to Chinese we tweak those settings automatically, but if you have mixed languages in a single instance you need to tweak that setting.
If the text has a single letter character that has no upper/lower case variant (like with Chinese), then the text is automatically not all uppercase. This could be checked by matching against /\p{Lo}/ in here.
This approach would not require a special setting tweak for forums primarily in zh//ko/ja and can also play well with forums where mixed languages are used, only enforcing the allow upper case where only uppercase-able characters are used.
Maybe a similar logic could also be used to optimize the existing check for all caps: if the text matches /\p{Ll}/ (lowercase letter that has an uppercase variant), then the text is not all caps.
Sounds like a good idea for a pull request!
My Ruby chops are nearly non existent, but I can try to put something together as it is somewhat contained.
With that said, I’m seeing a TODO at the top of that file which seems related with this precise line of code. Is it as simple as remove the require, or should someone that knows what they are doing go for this PR?
离 Ruby 开发者还差得很远,请多包涵。 ![]()
很高兴看到这个问题得到了解决!
我们运行一个国际论坛,虽然英语是主要语言,但我们有专门针对其他语言的版块,这长期以来一直令人烦恼。
既然 skipped_locale 现在仅用于 seems_unpretentious,我想知道我们是否可以跳过“ko”,因为现代韩语使用空格?请注意,我不会说韩语,所以您可能需要对此进行双重检查。
既然您有时间,我认为还有一件事可以轻松改进 TextSentinel,但我不敢尝试(同样,我不是 Ruby 开发者)。如果您有时间,我认为这相当简单,并且可以带来免费的性能提升。
据我理解,这会通过将文本拆分为单词来检查单词是否超过长度限制,计算每个单词的长度,然后扫描所有长度以找到最大值,最后才将其与限制进行比较。
我们是否可以通过尝试将文本与类似 /\\p{Alnum}{#{max_word_length + 1},}/ 的内容进行匹配来跳过所有这些(语法可能不正确,但希望您能理解我的意思)?
在不了解 Ruby 内部工作原理的情况下,这更有可能在找到匹配项时立即停止检查,如果不存在过长的单词(最常见的情况),文本只会被扫描一次,跳过拆分、单独的单词长度检查等。
抱歉在此劫持了话题,但由于新的 PR 已经合并,我不确定在哪里发布此信息最好,因为它可能太小了,不值得开新话题,但似乎是一个简单的改进。请随意继续。
我也不知道。但很想得到韩语使用者的确认。
这是一个绝妙的主意 ![]()
太棒了!
感谢您抽出宝贵时间。
也许一位韩语翻译者(@9bow、@alexkoala、@changukshin
)可以确认现代韩语是否像罗马/拉丁文字一样在单词之间使用空格,这样 Discourse 就可以在处理韩语文本以查找过长单词时利用这个假设?![]()