Hope Watched words 增加了对非英文字符的支持

zogstrip · 2026 年2 月 16 日 14:45

感谢您的报告，此问题将通过以下链接修复：

github.com/discourse/discourse

FIX: support CJK and spaceless scripts in watched word boundaries (#37844)

main ← fix/watched-words-cjk-boundaries

opened 02:44PM - 16 Feb 26 UTC

ZogStriP

+84 -9

Watched words failed to match in CJK (Chinese, Japanese, Korean) and other space…less scripts because word boundary detection relied on whitespace or non-word characters. Languages like Chinese don't use spaces between words, so "测试" inside "这是一个测试文本" was never matched. Introduce a SPACELESS_SCRIPTS constant covering Han, Hiragana, Katakana, Hangul, Thai, Lao, Myanmar, Khmer, and Tibetan Unicode ranges. Update `match_word_regexp` for both Ruby and JS engines so that characters from these scripts are treated as word boundaries. This allows a CJK watched word to match when surrounded by other CJK characters, and a Latin watched word to match when adjacent to CJK text (e.g., "Test" in "我的Test很好"), while still preventing partial Latin matches (e.g., "Testing" does not match "Test"). Also fix the admin watched word testing modal to use `RegExp.exec()` with capture group extraction instead of `String.match()`, since the new boundary patterns include a leading consuming group. Remove the outdated "non-chrome browsers do not support lookbehind" comment — all major browsers have supported lookbehind since 2023. https://meta.discourse.org/t/71288 https://meta.discourse.org/t/396109

话题		回复	浏览量
Russian characters in Watched Words list are failing to be properly identified Bug watched-words	1	550	2021 年2 月 10 日
Watched words: in Persian, content is affected without containing the word Support	6	777	2019 年5 月 9 日
Test Watched Words is Broken Bug watched-words	2	528	2023 年6 月 9 日
Accented characters cause false postives in Watched Words Bug watched-words	2	475	2023 年5 月 18 日
Censored words do not respect word boundaries in non-latin alphabet Bug	8	1553	2018 年11 月 29 日

Hope Watched words 增加了对非英文字符的支持

相关话题