After a successful import of mail archives ( mbox ), the content of the messages will display email addresses that would have been obfuscated by Gmane or the mailman2 archive server. This allows bots collecting addresses to harvest them in discourse and I’m looking for a way to avoid this.
globally removing email in the posts (a display plugin maybe?)
So this is a problem with the import which (if you still have the chance) should be fixed during the import phase. I have taken a look at your forum and it is full of broken content (email headers, wrong indentation) with respect to old emails not being cut off, but also emails that are replies to each other being put in different topics.
Either you have enabled show_trimmed_content (here) during import or your message format did not get recognized by the reply trimmer code (here). Although it looks like there a lot of other issues as well.
Good guess: I indeed set show_trimmed_content to true because the reply trimmer code frequently trims more than it should. Not just in the imported mbox, it also happened daily with replies by email. Although it should be possible to improve the trimmer it seemed like an uphill battle. People using mail will always (i) send weirdly formatted emails for whatever reason, (ii) expect them to be displayed in full.
There indeed are other issues in the import: it is far from perfect. Although I’d be happy to discuss them, they are not an immediate concern.
Since it looks like I did not miss an option in the import that would obfuscate emails there apparently are two options left:
Globally replace content in the posts with something like s/{email_regexp}/obfuscated/
Find/write a plugin that obfuscate displayed content (HTML converter ?) with s/{email_regexp}/obfuscated/
make sure to backup discourse before trying the following
Here is how to replace all email addresses in posts with [email_redacted]. The regular expression is rather limited and is likely to miss some but I prefer an expression I can read and understand when modifying the content of all posts.
$ ./launcher enter app
/var/www/discourse# su - postgres -c psql
psql (13.2 (Debian 13.2-1.pgdg100+1))
Type "help" for help.
postgres=# \c discourse
You are now connected to database "discourse" as user "postgres".
discourse=# \set re '[0-9a-z._%+-]+@[a-z0-9.-]+\\.[a-z]{2,64}'
discourse=# update posts set raw = regexp_replace(raw, :'re', '[email_redacted]', 'gi') where raw ~ :'re';
UPDATE 1
discourse=# update posts set cooked = regexp_replace(cooked, :'re', '[email_redacted]', 'gi') where cooked ~ :'re';
UPDATE 1