Hi there, I’m trying to run some regexp_replace
commands in the pg console to convert links in my testbed Discourse instance that was migrated from a huge Drupal 7 forum. Across almost 2M posts there are a bunch of links in Textile format that need to be converted to Markdown. This is what I’m trying to use:
It works correctly in the above tester as well as in my text editor; it turns this:
Cqwertyuioy - Lasgfdf Sddgfdds (Dsajjsa Vsjsjk Osaskgkk Spfs) "link":http://www.youtube.com/watch?v=aQjkOmzQ8RT
Into this:
Cqwertyuioy - Lasgfdf Sddgfdds (Dsajjsa Vsjsjk Osaskgkk Spfs) [link](http://www.youtube.com/watch?v=aQjkOmzQ8RT)
But in the Discourse pg console I run this:
update posts set raw = regexp_replace(raw, '"(.*?)"\:(http\S+?(?=\W+(?:$|\s))|http\S+)', E'[\\1](\\2)', 'g');
And I end up with this:
Cqwertyuioy - Lasgfdf Sddgfdds (Dsajjsa Vsjsjk Osaskgkk Spfs) [link](http:)//www.youtube.com/watch?v=aQjkOmzQ8RT
(Sorry for the anonymization.) So the important part is:
-
Correct:
[link](http://www.youtube.com/watch?v=aQjkOmzQ8RT)
-
Wrong::
[link](http:)//www.youtube.com/watch?v=aQjkOmzQ8RT
I imagine it has something to do with the escape character weirdness in the case of SQL queries and/or the special functions of []
and ()
vs literal, but I can’t figure it out. Any ideas? Thanks!