This topic explains how to effectively use regular expressions (regex) within Discourse’s Watched Words feature.
If your site is hosted with Discourse, please reach out to team@discourse.org if you would like to enable this setting.
Regular expressions (regex) are a powerful tool for defining search patterns. You can use regex in the Watched Words feature to enhance the accuracy and flexibility of word filtering on your Discourse site.
To use regular expressions (regex) in watched words you must first turn on the
watched words regular expressions
site setting.
Regex is extremely powerful and thus dangerous. An incorrectly written regex statement can cause issues for your users. Test your regex statements on non-production instances before going live.
Example Regex patterns
Here are some common regex patterns and how they can be employed:
Case-insensitivity
By default, Discourse matches both uppercase and lowercase forms of a word.
thread
This will match thread
, THREAD
, and thReAd
.
Character alternatives
Use character alternatives to expand your matches.
(t|7)hr(3|e)(4|a)d
This will match all of the cases above, plus thr3ad
, 7hread
, and thr34d
.
threads?\S+
This will match thread
and threads
but not threaded
or threading
.
Word boundaries
Regex patterns can unintentionally match parts of words. Use word boundaries to avoid partial matches.
\bthreads?\b
This matches thread
and threads
but avoids matches like threadlike
or unthreading
.
Handling Unicode characters
Standard word boundaries may fail with Unicode characters. Create boundaries for characters not handled well by JavaScript regex.
gr(ü|ue)(ß|ss)e
This matches all commonly spelled forms of the word grüße — including gruesse
and GRÜSSE
Say you want to block the word Über
, but not Übersicht
. Using word boundaries like \b(ü|ue)ber\b
doesn’t work because some of the JavaScript regex word flags don’t handle Unicode characters. Instead you have to make your own boundaries.
(?:^|\s)(ü|ue)ber\b
This will now appropriately match Über
and ueber
, but not Übersicht
or uebersicht
.
Catching deliberate character substitutions
To catch words where users substitute numbers or special characters for letters:
\bp[a@]ssw[o0]rd\b
This matches: password
, p@ssword
, passw0rd
, p@ssw0rd
, but not mypassword
or password123
Handling characters with punctuation in between
To catch attempts to evade filters by inserting punctuation:
\bs\W*p\W*a\W*m\b
This matches: spam
, s.p.a.m
, s-p-a-m
, s_p_a_m
, but not spammy
or myspam
Matching multiple word variations
For matching phrases that might appear with different word forms:
\b(contact|email|reach)( us| me)?\b
This matches: contact
, contact us
, contact me
, email
, email us
, email me
, reach
, reach us
, reach me
Detecting email patterns
To catch generic email address patterns:
\b[\w.%+-]+@[\w.-]+\.[a-zA-Z]{2,}\b
This matches: user@example.com
, my.name@sub.domain.co.uk
, user+tag@domain.org
Finding hashtag variations
To match hashtags with different casing or slight variations:
\#(disc[o0]urse|f[o0]rum)\b
This matches: #discourse
, #DISCOURSE
, #disc0urse
, #forum
, #f0rum
, but not #discourseengine
or #forums
Detecting repetitive patterns
To catch repeated characters that might indicate spammy content:
([a-zA-Z])\1{3,}
This matches: aaaample
, helllllo
, yessssss
, detecting any letter repeated 4 or more times in a row
Finding URLs with or without protocol
\b(?:https?:\/\/)?[\w-]+(\.[\w-]+)+\b
This matches: example.com
, sub.domain.org
, https://discourse.org
, http://meta.discourse.org
Avoiding nested character classes
Correct:
(hold)?
This correctly matches the optional word “hold”
Or if you want character alternatives:
[h][o0][l1][d]
This matches: hold
, h0ld
, ho1d
, h01d
Incorrect:
[h[o0][l1]d]?
This incorrectly tries to nest character classes and will match any single character from h
, o
, 0
, l
, 1
, or d
, making it match words like had
, old
, etc.
Using parentheses for optional words
Correct:
forum(s)?
This properly matches: forum
, forums
Incorrect:
forum[s]?
This matches “forum” followed by an optional “s”, but uses a character class unnecessarily.
Proper character class usage
Correct:
bad word
To match the phrase “bad word”
Or for a character class example:
[bB][aA][dD]
This matches: bad
, Bad
, bAd
, BAD
, etc.
Incorrect:
[bad word]
This matches any single character from b
, a
, d
, w
, o
, r
, or d
, not the phrase “bad word”.
Using quantifiers effectively
\b[0-9]{3,5}\b
This matches numbers with 3 to 5 digits: 123
, 1234
, 12345
, but not 12
or 123456
For specific repeating patterns:
(spam){2,3}
This matches: spamspam
, spamspamspam
Applying word boundaries properly
Without boundaries:
free
This matches: free
, freedom
, carefree
With boundaries:
\bfree\b
This matches only: free
, but not freedom
or carefree
Handling Unicode characters correctly
Correct approach:
(?:^|\s)(ö|oe)zel\b
This matches: özel
, oezel
at word boundaries, even with Unicode characters
Incorrect approach:
\bözel\b
This may not work correctly with the Turkish character ö.
Additional Information
You can test Regex expressions on https://regex101.com/. If you do so, ensure you switch the regex flavour to ECMAScript.
Regex capture and replace is not supported in Watched Words, only matching, so this will not work on the link or replace actions.
Last edited by @SaraDev 2025-08-08T22:36:35Z
Check document
Perform check on document: