Hope Watched words يضيف دعمًا للأحرف غير الإنجليزية

Noble_Fish · 14 فبراير 2026، 3:31م

هذه أداة إشراف مفيدة، ولكن لديها دعم ضعيف للأحرف غير الإنجليزية، ووجود الأحرف غير الإنجليزية يمكن أن يؤثر حتى على اكتشاف اللغة الإنجليزية والأرقام. هنا، بأخذ كلمة الصينية المبسطة “测试” (اختبار) كمثال، تحتوي قائمة الكلمات المراقبة على ثلاثة عناصر: “测试” و “Test” و “123”. في الاختبار أدناه، لم تؤد أي من الأمثلة الثلاثة إلى تشغيل الكلمات المراقبة.

بحثت في الموقع ووجدت مشكلة أخرى مماثلة بخصوص الكلمات المحظورة: Censored words do not respect word boundaries in non-latin alphabet. يبدو أن هذه مشكلة شائعة في نظام مطابقة الكلمات المراقبة بأكمله؟

zogstrip · 16 فبراير 2026، 2:45م

شكرًا على التقرير، سيتم إصلاح هذا بواسطة

github.com/discourse/discourse

FIX: support CJK and spaceless scripts in watched word boundaries (#37844)

main ← fix/watched-words-cjk-boundaries

opened 02:44PM - 16 Feb 26 UTC

ZogStriP

+84 -9

Watched words failed to match in CJK (Chinese, Japanese, Korean) and other space…less scripts because word boundary detection relied on whitespace or non-word characters. Languages like Chinese don't use spaces between words, so "测试" inside "这是一个测试文本" was never matched. Introduce a SPACELESS_SCRIPTS constant covering Han, Hiragana, Katakana, Hangul, Thai, Lao, Myanmar, Khmer, and Tibetan Unicode ranges. Update `match_word_regexp` for both Ruby and JS engines so that characters from these scripts are treated as word boundaries. This allows a CJK watched word to match when surrounded by other CJK characters, and a Latin watched word to match when adjacent to CJK text (e.g., "Test" in "我的Test很好"), while still preventing partial Latin matches (e.g., "Testing" does not match "Test"). Also fix the admin watched word testing modal to use `RegExp.exec()` with capture group extraction instead of `String.match()`, since the new boundary patterns include a leading consuming group. Remove the outdated "non-chrome browsers do not support lookbehind" comment — all major browsers have supported lookbehind since 2023. https://meta.discourse.org/t/71288 https://meta.discourse.org/t/396109

الموضوع		الردود	مرات العرض
Russian characters in Watched Words list are failing to be properly identified Bug watched-words	1	553	10 فبراير 2021
Watched words: in Persian, content is affected without containing the word Support	6	780	9 مايو 2019
Test Watched Words is Broken Bug watched-words	2	538	9 يونيو 2023
Accented characters cause false postives in Watched Words Bug watched-words	2	485	18 مايو 2023
Censored words do not respect word boundaries in non-latin alphabet Bug	8	1560	29 نوفمبر 2018

Hope Watched words يضيف دعمًا للأحرف غير الإنجليزية

الموضوعات ذات الصلة