Rules for Number Matching in Watched Words

As shown in the figure below, the matching words are “Test”, “123”, and “测试” (which means “Test” in Simplified Chinese).

When a match is successful, any number immediately preceding the matched word will be included in the matching result, and this even applies to Unicode characters.
Numbers immediately following the matched word do not exhibit this behavior;
This also affects number matching, including cases where strings consist solely of English letters and numbers. For example, “Test123” can only match “Test” but cannot match “123” or “Test123” (itself).

I might be a bit out of the loop here—what kind of matching rule is this? Could someone explain it to me? :melting_face:

1 Like

Seems like a possible regex bug, which are always a little tricky to untangle. @zogstrip maybe you can have a look since you’ve worked in this area recently (though this seems to have existed for a while).

Here’s my understanding…

When we’re checking words, there are 3 segments that have to match: leading, the word, trailing. We’ve set leading and trailing to be non-letter characters… these could be punctuation, spaces, OR numbers. The numbers are what’s throwing off the match here. The intention is to be able to catch words even if there’s punctuation or whatever before/after the word.

So the regex sees 123Test, finds Test, looks before it and finds 3 and that matches as a “non-letter”, then looks after and finds the end of the word. So it matches on 3Test.

I think we need to check for non-letter AND non-number characters in the leading/trailing segments to avoid this? Not sure if there’s a reason we didn’t include numbers or if it’s just an oversight.

2 Likes

Maybe this topic needs to be moved to the Bug category?

1 Like

This should fix it for good this time. We’ve had some inconsistency between the Ruby version and the JS version of the regexpes used but that is now unnecessary.

3 Likes