Could someone tell me about Discourse regexes?


(rizka) #1

I’ve worked on censoring the most vulgar swear words with regular expressions today. Why regexes? Well, in Finnish and other Uralic languages like Hungarian and Estonian words are inflected. A single swear word could have maybe thousands of mutations, which is why it is awesome to have the ability to use regex patterns. It is also no coincidence that it was another Finn who proposed this originally.

I need some quick advice about which regex flavor Discourse uses. I experience some unexpected behavior with non-alphanumeric characters which is awkward especially because of the common letter ä in the Finnish alphabet. I got the regex into pretty good shape by basic knowledge about regexes and the method of trial and error, but for an even better result, I would need documentation or something.


A closing round bracket breaks word censoring
(Rafael dos Santos Silva) #2

You can read about it in the source code.


(Eli the Bearded) #3

Reading that, I don’t see much about them except to see that they are Javascript regular expressions. (I would have assumed Ruby without that link.) So a Javascript reference would be in order.

Which has internal links to specifications, if you want to go deeper.


(Mittineague) #4

AFAIK, Ruby and Postgres support POSiX

https://ruby-doc.org/core-2.2.0/Regexp.html

https://www.postgresql.org/docs/9.3/static/functions-matching.html


(rizka) #5

Cool, thank you for your replies all. I’ll look into them and return if I still can’t figure it out.