Using Regex with Watched Words

:bookmark: This topic explains how to effectively use regular expressions (regex) within Discourse’s Watched Words feature.

:discourse: If your site is hosted with Discourse, please reach out to team@discourse.org if you would like to enable this setting.

Regular expressions (regex) are a powerful tool for defining search patterns. You can use regex in the Watched Words feature to enhance the accuracy and flexibility of word filtering on your Discourse site.

:information_source: To use regular expressions (regex) in watched words you must first turn on the watched words regular expressions site setting.

:warning: Regex is extremely powerful and thus dangerous. An incorrectly written regex statement can cause issues for your users. Test your regex statements on non-production instances before going live.

Example Regex patterns

Here are some common regex patterns and how they can be employed:

Case-insensitivity

By default, Discourse matches both uppercase and lowercase forms of a word.

thread

This will match thread, THREAD, and thReAd.

Character alternatives

Use character alternatives to expand your matches.

(t|7)hr(3|e)(4|a)d

This will match all of the cases above, plus thr3ad, 7hread, and thr34d.

threads?\S+

This will match thread and threads but not threaded or threading.

Word boundaries

Regex patterns can unintentionally match parts of words. Use word boundaries to avoid partial matches.

\bthreads?\b

This matches thread and threads but avoids matches like threadlike or unthreading.

Handling Unicode characters

Standard word boundaries may fail with Unicode characters. Create boundaries for characters not handled well by JavaScript regex.

gr(ü|ue)(ß|ss)e

This matches all commonly spelled forms of the word grüße — including gruesse and GRÜSSE

Say you want to block the word Über, but not Übersicht. Using word boundaries like \b(ü|ue)ber\b doesn’t work because some of the JavaScript regex word flags don’t handle Unicode characters.

Instead you have to make your own boundaries.

(?:^|\s)(ü|ue)ber\b

This will now appropriately match Über and ueber, but not Übersicht or uebersicht.

Additional Information

:information_source: You can test Regex expressions on https://regex101.com/. If you do so, ensure you switch the regex flavour to ECMAScript.

Regex capture and replace is not supported in Watched Words, only matching, so this will not work on the link or replace actions.

Last edited by @SaraDev 2024-08-02T19:09:22Z

Check documentPerform check on document:
11 Likes

Forgive my noobness, but I was not able to find the site setting for watched words regular expressions anywhere. I also looked for regex, regular expression, and other variants, but didn’t find anything that looked like it would enable regex for watched words. Do you have the slug to the site settings where this could be enabled (cloud hosted instance)?

EDIT the answer was just above and found here

2 Likes