Discourse Regexes (Watched Words)
To use regular expressions (regex) in watched words you must first turn on the
watched words regular expressions
site setting.
Discourse by default matches all uppercase and lowercase forms of a word entered as a regular expression. That is,
thread
This will match thread
, THREAD
, and thReAd
.
(t|7)hr(3|e)(4|a)d
This will match all of the cases above, plus thr3ad
, 7hread
, and thr34d
.
threads?\S+
This will match thread
and threads
but not threaded
or threading
.
However, there’s a glaring error in ALL the above examples! The words threadlike
and unthreading
are matched (un▪️▪️▪️▪️▪️ing
), even though they’re not referring to thread
. How do we fix that?
We’d have to amend our regex to handle word boundaries.
\bthreads?\b
This looks for boundaries around the word so that unthreading
or threadlike
aren’t caught by the filter, but thread
and threads
still are.
For handling Unicode characters
gr(ü|ue)(ß|ss)e
This matches all commonly spelled forms of the word grüße — including gruesse
and GRÜSSE
Say we want to block the word Über
, but not Übersicht
. Using word boundaries like \b(ü|ue)ber\b
doesn’t work because some of the JavaScript regex word flags don’t handle Unicode characters.
Instead we have to make our own boundaries.
(?:^|\s)(ü|ue)ber\b
This will now appropriately match Über
and ueber
, but not Übersicht
or uebersicht
.
A final warning
Regex is extremely powerful and thus dangerous. An incorrectly written regex statement can cause issues for your users. Test your regex statements on non-production instances before going live.