Russian characters in Watched Words list are failing to be properly identified

I’ve been expanding the watched words list for our company and found an odd issue. We’d like to be able to use the watched words list for all supported languages, but it is improperly flagging certain words which are fine in Russian because it is not detecting all of the characters in the word (it seems).

Example 1: Regular watched words with English characters work fine

watchedwords1

Example 2: If I add a character to the front of this, it no longer flags it (which is working as intended)

watchedwords2

Example 3: But for certain Russian characters, the letters look identical to the english character but they seem to have a different unicode that makes them not appear.

image (21)

абля is being improperly flagged even though it is not on the list. Deleting and re-typing the “a” on an English keyboard results in the word no longer being flagged (likely due to a different coding of the character). This is resulting in perfectly fine words being improperly flagged, which is undesired.

Another example is себ being improperly flagged in the same manner, when only еб is on the watched words list.

If anyone has workaround suggestions for this I’d be happy to hear them! Thanks :slight_smile:

1 Like

Hi @CCP_Aurora we will have a look, I recall getting the regexes to work properly in unicode and handle boundaries correctly was a bit of an adventure. This certainly looks like a bug.

@gerhard may have some ideas as well, I recall he worked on similar issues in the past.

4 Likes