已关注词汇中数字匹配的规则

Noble_Fish · 2026 年2 月 14 日 15:47

如下图中所示，匹配的词语是“Test”、“123”和“测试”。

当匹配成功时，紧跟在匹配词前面的任何数字都会被包含在匹配结果中，即使是 Unicode 字符也一样。
紧跟在匹配词后面的数字不具有这种行为；
这也影响了数字匹配，包括字符串仅由英文字母和数字组成的情况。例如，“Test123”只能匹配“Test”，但不能匹配“123”或“Test123”（它本身）。

我可能有点跟不上进度了——这是什么类型的匹配规则？有人能给我解释一下吗？

awesomerobot · 2026 年2 月 20 日 19:42

这看起来像是一个潜在的正则表达式错误，这类错误总是有点棘手。@zogstrip 也许你可以看一下，因为你最近在这个领域工作过（尽管这似乎已经存在一段时间了）。

我的理解是这样的……

当我们检查单词时，有 3 个部分必须匹配：前导、单词、尾随。我们将前导和尾随设置为非字母字符……这些可以是标点符号、空格，或者数字。数字是导致匹配出错的原因。本意是即使单词前后有标点符号或其他字符，也能捕获到该单词。

所以正则表达式看到了 123Test，找到了 Test，查看它前面发现 3，这匹配为一个“非字母”，然后查看后面发现单词的结尾。因此它匹配了 3Test。

我认为我们需要检查前导/尾随部分中的非字母和非数字字符，以避免这种情况？不确定我们没有包含数字的原因，还是这只是一个疏忽。

Noble_Fish · 2026 年2 月 20 日 19:44

也许这个话题需要移到 Contribute > Bug 分类？

zogstrip · 2026 年2 月 21 日 10:54

这次应该能把问题彻底解决。我们之前在 Ruby 版本和 JS 版本使用的正则表达式之间存在一些不一致，但现在不再需要了。

github.com/discourse/discourse

FIX: unify watched word boundary regex across Ruby and JS engines (#37965)

main ← fix/unify-word-boundary-regex

opened 11:54PM - 20 Feb 26 UTC

ZogStriP

+122 -251

The CJK fix (d7a53ada16) introduced separate boundary patterns for Ruby and JS e…ngines in `match_word_regexp`. The Ruby engine used `[:word:]` (which includes digits), while the JS engine used `\P{L}` (non-Letter). Since digits are not letters, the JS pattern treated them as valid word boundaries — causing "123Test" to match as "3Test" and standalone number watched words like "123" to match inside "abc123". Replace both engine-specific patterns with a single unified pattern using Unicode property classes (`\p{L}`, `\p{M}`, `\p{N}`, `\p{Pc}`) that work identically in Ruby and JavaScript. This treats letters, marks, numbers, and connector punctuation as word characters in boundary checks, which fixes the number-matching bug for JS consumers while preserving the existing correct behavior on the Ruby side. Since `match_word_regexp` no longer branches on engine, remove the now-dead `engine:` parameter from all 5 method signatures that threaded it through (`match_word_regexp`, `word_to_regexp`, `regexps_for_action`, `compiled_regexps_for_action`, `serialized_regexps_for_action`) and all call sites passing `engine: :js` (serializers, pretty_text). https://meta.discourse.org/t/396110 https://meta.discourse.org/t/396109 Follow-up to d7a53ada16 (#37844)

话题		回复	浏览量
Hope Watched words adds support for non-English characters Bug	1	84	2026 年2 月 16 日
Russian characters in Watched Words list are failing to be properly identified Bug watched-words	1	552	2021 年2 月 10 日
Can't enter watched words regex to catch phone numbers Support regex , watched-words	2	125	2025 年5 月 17 日
Watched word regular expression crash Bug watched-words	6	927	2023 年11 月 29 日
How to use Discourse regexes with watched words? Support	6	2451	2019 年5 月 30 日

已关注词汇中数字匹配的规则

相关话题