Replace a string in all posts

Want to replace a string in all the posts? Let’s do it!

Access your site

First connect to your Droplet via SSH, and enter the Docker container for your Discourse instances:

cd /var/discourse
./launcher enter app

:warning: WARNING

We strongly recommend you take a full backup before proceeding, and make sure your string replacement is specific enough to affect only the places you want it to. If this string replacement goes wrong, every post on your site will look broken!

Remap all posts containing a specific string

Basic Case-sensitive Remap
Run the following command, substituting find with the string you wish to replace and replace with the replacement string. Note that this does not respect word boundaries.

rake posts:remap["find","replace"]

Example results:

find —> replace
Find —> Find
FIND —> FIND
finders keepers —> replaceers keepers
finding —> replaceing

This method can be useful for tasks such as replacing emojis:

rake posts:remap[":slightly_smiling:",":slight_smile:"]

The above command will replace all occurrences of :slightly_smiling: with :slight_smile:.

Case-insensitive Remap

rake posts:remap["find","replace","string",true]

Example results:

find —> replace
Find —> replace
FIND —> replace
finders keepers —> replaceers keepers
finding —> replaceing

Regex Remap

If you need more specificity and feel a little adventurous, you can use regex!

To handle word boundaries, you currently need to format your command like so:

rake posts:remap['(?<!\\w)(?=\\w)find(?<=\\w)(?!\\w)','replace','regex']

Example results:

replace —> replace
Find —> Find
FIND —> FIND
finders keepers —> finders keepers
finding —> finding

Remove/Delete all occurrences of a word/string

The concept is the same as above, except you are removing a word entirely instead of replacing it:

Basic Case-sensitive Delete

rake posts:delete_word["word-to-delete"]

Case-insensitive Delete

rake posts:delete_word["word-to-delete","string",true]

Regex Delete

rake posts:delete_word['\\[color=#[0-9a-fA-F]{3\,6}\\]','regex']

Example results:

[color=#ff3333] —>
[color=#ffffff]testing —> testing
[color=#101010]testing[/color] —> testing[/color]

51 Likes

6 posts were split to a new topic: How to replace a string/character in topic title?

Now that the rake posts:remap task accepts regular expressions, I’m trying to figure out how to use it to replace a string in all posts and respect word boundaries.

For example, given a post that contains the words ‘foo’, ‘food’, and ‘surefooted’, how could I replace ‘foo’ with ‘pizza’, while not altering the words ‘food’ or ‘surefooted’? I’ve tried a few ways to get this to work. I’m fairly sure I’m missing something obvious.

3 Likes

I think when I’ve done that in the past I’ve given up and done it with rails. Figuring out how to get stuff escaped correctly through bash and into the rake task was more than I could manage.

3 Likes

The raw regex would be

\b(foo)\b

Where

  • \b means “word boundary”
  • the parens indicate “this is the part of the match we want to replace”, we don’t want to replace the word boundary. This match is referred to as $1 by default.

Here is a little demo in regexr.com

Escaping of slashes and such (if you are feeding it in at the command line) can make things more complicated, though.

1 Like

It’s a tricky one. The problem I’m finding is that the posts to be remapped are found with the Post.raw_match method. That method seems to expect TCL word boundaries (\mfoo\M or \yfoo\y.)

When the posts are returned, the content is replaced by calling gsub on the post’s raw content. gsub expects \b for word boundaries.

If I add a line to the rake task to substitute \b for \y in the regex pattern after the posts have been found, calling rake posts:remap['\\yfoo\\y','pizza','regex'] returns the correct results.

There might be a way to look for word boundaries that would be accepted by both mothods that are used in the task.

Edit: from the TCL docs:

If your regex flavor supports lookahead and lookbehind, you can use (?<!\w)(?=\w) to emulate Tcl’s \m and (?<=\w)(?!\w) to emulate \M. Though quite a bit more verbose, these lookaround constructs match exactly the same as Tcl’s word boundaries.

This works:
rake posts:remap['(?<!\\w)(?=\\w)foo(?<=\\w)(?!\\w)','pizza','regex']

5 Likes

Ugh so this method uses some weirdo non-standard regex syntax, and not just that, but two different flavors? Any ideas here @riking?

2 Likes

Where I’m finding I need to use the unexpected syntax is in the call to Post.raw_match. That’s an easy one to test from the rails console.

I was wondering if this was just an issue related to my local installation, but I’ve tested it on my hosted site and got similar results.

1 Like

Checking the PostgreSQL docs… we have the amazing "\b is backspace" interpretation!

https://www.postgresql.org/docs/9.3/functions-matching.html#POSIX-ESCAPE-SEQUENCES

We’re passing the regex directly to the Postgres evaluator, so that’s where these weird word boundary definitions are coming from.

Many of the ARE extensions are borrowed from Perl, but some have been changed to clean them up, and a few Perl extensions are not present. Incompatibilities of note include \b, \B, the lack of special treatment for a trailing newline, the addition of complemented bracket expressions to the things affected by newline-sensitive matching, the restrictions on parentheses and back references in lookahead constraints, and the longest/shortest-match (rather than first-match) matching semantics.

4 Likes

@codinghorror Is there an option for this for communities hosted by Discourse?

1 Like

I believe that you can contact support through the normal channels to request this.

2 Likes

Hi,

I’m currently trying to use regex to change the string [member=xxxx]aaaaa where xxxx is a number from 1 to 5 digits and aaaaa a word right after the ]. I just want to change the first part, not the word right after.

I’ve tried using the following code with no success:
rake posts:remap[‘\[member\=\d+\]’,’\@‘,regex]

I thought maybe it was the word right after but no success either:
rake posts:remap[‘\[member\=\d+\]\(w+\)’,’\@$1‘,regex]

Any ideas?

What would be the command to replace some strings only for messages posted after a certain date (or other criteria)?

I wonder if you are running into the issue with word boundaries that is outlined in this post: Replace a string in all posts? If so, there is a workaround for the issue in the code example that’s at the bottom of the post.

3 Likes