監視された単語での数字一致のルール

Noble_Fish · 2026 年 2 月 14 日午後 3:47

下記の図に示すように、一致する単語は「Test」、「123」、「测试」（簡体字中国語で「Test」を意味します）です。

一致が成功した場合、一致した単語の直前にある任意の数値が一致結果に含まれます。これはUnicode文字にも適用されます。
一致した単語の直後の数値にはこの動作は見られません。
これは数値の一致にも影響し、文字列が英字と数字のみで構成されている場合も同様です。例えば、「Test123」は「Test」にのみ一致し、「123」や「Test123」（それ自体）には一致できません。

私が少し乗り遅れているのかもしれませんが、これはどのような種類の一致ルールなのでしょうか？誰か説明していただけませんか？

awesomerobot · 2026 年 2 月 20 日午後 7:42

これは、デバッグが少し厄介な、正規表現のバグの可能性があります。最近この分野で作業されたことがある @zogstrip さん、ご確認いただけますでしょうか（ただし、これはしばらく前から存在しているようです）。

私の理解は以下の通りです…

単語をチェックするとき、一致しなければならないセグメントは 3 つあります。先頭、単語、末尾です。先頭と末尾を、文字以外の文字（句読点、スペース、または数字）に設定しています。ここでマッチを狂わせているのは数字です。意図としては、単語の前後が句読点などで囲まれていても、単語をキャッチできるようにすることです。

そのため、正規表現は 123Test を見て、Test を見つけ、その前を見ると 3 があり、これが「文字以外」として一致し、その後ろを見ると単語の終わりが見つかります。したがって、3Test に一致します。

先頭/末尾のセグメントで、数字以外の文字もチェックする必要があると思います。数字を含めなかった理由があるのか、それとも単なる見落としなのかはわかりません。

Noble_Fish · 2026 年 2 月 20 日午後 7:44

もしかして、このトピックは Contribute > Bug カテゴリに移動した方がいいですか？

zogstrip · 2026 年 2 月 21 日午前 10:54

これで今回は本当に修正されるはずです。これまでRuby版とJS版で使用されている正規表現に一貫性のない問題がありましたが、それは不要になりました。

github.com/discourse/discourse

FIX: unify watched word boundary regex across Ruby and JS engines (#37965)

main ← fix/unify-word-boundary-regex

opened 11:54PM - 20 Feb 26 UTC

ZogStriP

+122 -251

The CJK fix (d7a53ada16) introduced separate boundary patterns for Ruby and JS e…ngines in `match_word_regexp`. The Ruby engine used `[:word:]` (which includes digits), while the JS engine used `\P{L}` (non-Letter). Since digits are not letters, the JS pattern treated them as valid word boundaries — causing "123Test" to match as "3Test" and standalone number watched words like "123" to match inside "abc123". Replace both engine-specific patterns with a single unified pattern using Unicode property classes (`\p{L}`, `\p{M}`, `\p{N}`, `\p{Pc}`) that work identically in Ruby and JavaScript. This treats letters, marks, numbers, and connector punctuation as word characters in boundary checks, which fixes the number-matching bug for JS consumers while preserving the existing correct behavior on the Ruby side. Since `match_word_regexp` no longer branches on engine, remove the now-dead `engine:` parameter from all 5 method signatures that threaded it through (`match_word_regexp`, `word_to_regexp`, `regexps_for_action`, `compiled_regexps_for_action`, `serialized_regexps_for_action`) and all call sites passing `engine: :js` (serializers, pretty_text). https://meta.discourse.org/t/396110 https://meta.discourse.org/t/396109 Follow-up to d7a53ada16 (#37844)

トピック		返信	表示
Hope Watched words adds support for non-English characters Bug	1	85	2026 年 2 月 16 日
Russian characters in Watched Words list are failing to be properly identified Bug watched-words	1	553	2021 年 2 月 10 日
Can't enter watched words regex to catch phone numbers Support regex , watched-words	2	129	2025 年 5 月 17 日
Watched word regular expression crash Bug watched-words	6	929	2023 年 11 月 29 日
How to use Discourse regexes with watched words? Support	6	2455	2019 年 5 月 30 日

監視された単語での数字一致のルール

関連トピック