Rules for Number Matching in Watched Words

Noble_Fish · February 14, 2026, 3:47pm

As shown in the figure below, the matching words are “Test”, “123”, and “测试” (which means “Test” in Simplified Chinese).

When a match is successful, any number immediately preceding the matched word will be included in the matching result, and this even applies to Unicode characters.
Numbers immediately following the matched word do not exhibit this behavior;
This also affects number matching, including cases where strings consist solely of English letters and numbers. For example, “Test123” can only match “Test” but cannot match “123” or “Test123” (itself).

I might be a bit out of the loop here—what kind of matching rule is this? Could someone explain it to me?

awesomerobot · February 20, 2026, 7:42pm

Seems like a possible regex bug, which are always a little tricky to untangle. @zogstrip maybe you can have a look since you’ve worked in this area recently (though this seems to have existed for a while).

Here’s my understanding…

When we’re checking words, there are 3 segments that have to match: leading, the word, trailing. We’ve set leading and trailing to be non-letter characters… these could be punctuation, spaces, OR numbers. The numbers are what’s throwing off the match here. The intention is to be able to catch words even if there’s punctuation or whatever before/after the word.

So the regex sees 123Test, finds Test, looks before it and finds 3 and that matches as a “non-letter”, then looks after and finds the end of the word. So it matches on 3Test.

I think we need to check for non-letter AND non-number characters in the leading/trailing segments to avoid this? Not sure if there’s a reason we didn’t include numbers or if it’s just an oversight.

Noble_Fish · February 20, 2026, 7:44pm

Maybe this topic needs to be moved to the Bug category?

zogstrip · February 21, 2026, 10:54am

This should fix it for good this time. We’ve had some inconsistency between the Ruby version and the JS version of the regexpes used but that is now unnecessary.

github.com/discourse/discourse

FIX: unify watched word boundary regex across Ruby and JS engines (#37965)

main ← fix/unify-word-boundary-regex

merged 03:30PM - 23 Feb 26 UTC

ZogStriP

+122 -251

The CJK fix (d7a53ada16) introduced separate boundary patterns for Ruby and JS e…ngines in `match_word_regexp`. The Ruby engine used `[:word:]` (which includes digits), while the JS engine used `\P{L}` (non-Letter). Since digits are not letters, the JS pattern treated them as valid word boundaries — causing "123Test" to match as "3Test" and standalone number watched words like "123" to match inside "abc123". Replace both engine-specific patterns with a single unified pattern using Unicode property classes (`\p{L}`, `\p{M}`, `\p{N}`, `\p{Pc}`) that work identically in Ruby and JavaScript. This treats letters, marks, numbers, and connector punctuation as word characters in boundary checks, which fixes the number-matching bug for JS consumers while preserving the existing correct behavior on the Ruby side. Since `match_word_regexp` no longer branches on engine, remove the now-dead `engine:` parameter from all 5 method signatures that threaded it through (`match_word_regexp`, `word_to_regexp`, `regexps_for_action`, `compiled_regexps_for_action`, `serialized_regexps_for_action`) and all call sites passing `engine: :js` (serializers, pretty_text). https://meta.discourse.org/t/396110 https://meta.discourse.org/t/396109 Follow-up to d7a53ada16 (#37844)

Topic		Replies	Views
Hope Watched words adds support for non-English characters Bug	2	54	February 20, 2026
Russian characters in Watched Words list are failing to be properly identified Bug watched-words	1	535	February 10, 2021
Can't enter watched words regex to catch phone numbers Support watched-words , regex	3	104	June 16, 2025
Watched word regular expression crash Bug watched-words	7	885	June 28, 2024
How to use Discourse regexes with watched words? Support	8	2430	June 29, 2019

Rules for Number Matching in Watched Words

Related topics