How to use Discourse regexes with watched words?

justin · May 29, 2019, 6:04pm

Discourse Regexes (Watched Words)

To use regular expressions (regex) in watched words you must first turn on the watched words regular expressions site setting.

Discourse by default matches all uppercase and lowercase forms of a word entered as a regular expression. That is,

thread

This will match thread, THREAD, and thReAd.

(t|7)hr(3|e)(4|a)d

This will match all of the cases above, plus thr3ad, 7hread, and thr34d.

threads?\S+

This will match thread and threads but not threaded or threading.

However, there’s a glaring error in ALL the above examples! The words threadlike and unthreading are matched (un▪️▪️▪️▪️▪️ing), even though they’re not referring to thread. How do we fix that?

We’d have to amend our regex to handle word boundaries.

\bthreads?\b

This looks for boundaries around the word so that unthreading or threadlike aren’t caught by the filter, but thread and threads still are.

For handling Unicode characters

gr(ü|ue)(ß|ss)e

This matches all commonly spelled forms of the word grüße — including gruesse and GRÜSSE

Say we want to block the word Über, but not Übersicht. Using word boundaries like \b(ü|ue)ber\b doesn’t work because some of the JavaScript regex word flags don’t handle Unicode characters.

Instead we have to make our own boundaries.

(?:^|\s)(ü|ue)ber\b

This will now appropriately match Über and ueber, but not Übersicht or uebersicht.

A final warning

Regex is extremely powerful and thus dangerous. An incorrectly written regex statement can cause issues for your users. Test your regex statements on non-production instances before going live.

Topic		Replies	Views
Using Regex with Watched Words admins reference , regex , watched-words	0	937	October 13, 2022
A closing round bracket breaks word censoring bug	5	1362	September 13, 2017
Support for wildcards in word censoring feature	15	2679	July 9, 2018
Russian characters in Watched Words list are failing to be properly identified bug watched-words	1	463	February 10, 2021
* wildcards in Watched Words (Censor) don't work feature	20	2861	January 11, 2018

How to use Discourse regexes with watched words?

Discourse Regexes (Watched Words)

A final warning

Related Topics