Using Regex with Watched Words

:bookmark: This topic explains how to effectively use regular expressions (regex) within Discourse’s Watched Words feature.

:discourse: If your site is hosted with Discourse, please reach out to team@discourse.org if you would like to enable this setting.

Regular expressions (regex) are a powerful tool for defining search patterns. You can use regex in the Watched Words feature to enhance the accuracy and flexibility of word filtering on your Discourse site.

:information_source: To use regular expressions (regex) in watched words you must first turn on the watched words regular expressions site setting.

:warning: Regex is extremely powerful and thus dangerous. An incorrectly written regex statement can cause issues for your users. Test your regex statements on non-production instances before going live.

Example Regex patterns

Here are some common regex patterns and how they can be employed:

Case-insensitivity

By default, Discourse matches both uppercase and lowercase forms of a word.

thread

This will match thread, THREAD, and thReAd.

Character alternatives

Use character alternatives to expand your matches.

(t|7)hr(3|e)(4|a)d

This will match all of the cases above, plus thr3ad, 7hread, and thr34d.

threads?\S+

This will match thread and threads but not threaded or threading.

Word boundaries

Regex patterns can unintentionally match parts of words. Use word boundaries to avoid partial matches.

\bthreads?\b

This matches thread and threads but avoids matches like threadlike or unthreading.

Handling Unicode characters

Standard word boundaries may fail with Unicode characters. Create boundaries for characters not handled well by JavaScript regex.

gr(ü|ue)(ß|ss)e

This matches all commonly spelled forms of the word grüße — including gruesse and GRÜSSE

Say you want to block the word Über, but not Übersicht. Using word boundaries like \b(ü|ue)ber\b doesn’t work because some of the JavaScript regex word flags don’t handle Unicode characters. Instead you have to make your own boundaries.

(?:^|\s)(ü|ue)ber\b

This will now appropriately match Über and ueber, but not Übersicht or uebersicht.

Catching deliberate character substitutions

To catch words where users substitute numbers or special characters for letters:

\bp[a@]ssw[o0]rd\b

This matches: password, p@ssword, passw0rd, p@ssw0rd, but not mypassword or password123

Handling characters with punctuation in between

To catch attempts to evade filters by inserting punctuation:

\bs\W*p\W*a\W*m\b

This matches: spam, s.p.a.m, s-p-a-m, s_p_a_m, but not spammy or myspam

Matching multiple word variations

For matching phrases that might appear with different word forms:

\b(contact|email|reach)( us| me)?\b

This matches: contact, contact us, contact me, email, email us, email me, reach, reach us, reach me

Detecting email patterns

To catch generic email address patterns:

\b[\w.%+-]+@[\w.-]+\.[a-zA-Z]{2,}\b

This matches: user@example.com, my.name@sub.domain.co.uk, user+tag@domain.org

Finding hashtag variations

To match hashtags with different casing or slight variations:

\#(disc[o0]urse|f[o0]rum)\b

This matches: #discourse, #DISCOURSE, #disc0urse, #forum, #f0rum, but not #discourseengine or #forums

Detecting repetitive patterns

To catch repeated characters that might indicate spammy content:

([a-zA-Z])\1{3,}

This matches: aaaample, helllllo, yessssss, detecting any letter repeated 4 or more times in a row

Finding URLs with or without protocol

\b(?:https?:\/\/)?[\w-]+(\.[\w-]+)+\b

This matches: example.com, sub.domain.org, https://discourse.org, http://meta.discourse.org

Avoiding nested character classes

Correct:

(hold)?

This correctly matches the optional word “hold”

Or if you want character alternatives:

[h][o0][l1][d]

This matches: hold, h0ld, ho1d, h01d

Incorrect:

[h[o0][l1]d]?

This incorrectly tries to nest character classes and will match any single character from h, o, 0, l, 1, or d, making it match words like had, old, etc.

Using parentheses for optional words

Correct:

forum(s)?

This properly matches: forum, forums

Incorrect:

forum[s]?

This matches “forum” followed by an optional “s”, but uses a character class unnecessarily.

Proper character class usage

Correct:

bad word

To match the phrase “bad word”

Or for a character class example:

[bB][aA][dD]

This matches: bad, Bad, bAd, BAD, etc.

Incorrect:

[bad word]

This matches any single character from b, a, d, w, o, r, or d, not the phrase “bad word”.

Using quantifiers effectively

\b[0-9]{3,5}\b

This matches numbers with 3 to 5 digits: 123, 1234, 12345, but not 12 or 123456

For specific repeating patterns:

(spam){2,3}

This matches: spamspam, spamspamspam

Applying word boundaries properly

Without boundaries:

free

This matches: free, freedom, carefree

With boundaries:

\bfree\b

This matches only: free, but not freedom or carefree

Handling Unicode characters correctly

Correct approach:

(?:^|\s)(ö|oe)zel\b

This matches: özel, oezel at word boundaries, even with Unicode characters

Incorrect approach:

\bözel\b

This may not work correctly with the Turkish character ö.

Additional Information

:information_source: You can test Regex expressions on https://regex101.com/. If you do so, ensure you switch the regex flavour to ECMAScript.

Regex capture and replace is not supported in Watched Words, only matching, so this will not work on the link or replace actions.

Last edited by @SaraDev 2025-08-08T22:36:35Z

Check documentPerform check on document:
12 Likes

Forgive my noobness, but I was not able to find the site setting for watched words regular expressions anywhere. I also looked for regex, regular expression, and other variants, but didn’t find anything that looked like it would enable regex for watched words. Do you have the slug to the site settings where this could be enabled (cloud hosted instance)?

EDIT the answer was just above and found here

2 Likes