Postgres console edge case not obeying POSIX regexp rules?

Hi there, I’m trying to run some regexp_replace commands in the pg console to convert links in my testbed Discourse instance that was migrated from a huge Drupal 7 forum. Across almost 2M posts there are a bunch of links in Textile format that need to be converted to Markdown. This is what I’m trying to use:

It works correctly in the above tester as well as in my text editor; it turns this:

  • Cqwertyuioy - Lasgfdf Sddgfdds (Dsajjsa Vsjsjk Osaskgkk Spfs) "link":http://www.youtube.com/watch?v=aQjkOmzQ8RT

Into this:

  • Cqwertyuioy - Lasgfdf Sddgfdds (Dsajjsa Vsjsjk Osaskgkk Spfs) [link](http://www.youtube.com/watch?v=aQjkOmzQ8RT)

But in the Discourse pg console I run this:

  • update posts set raw = regexp_replace(raw, '"(.*?)"\:(http\S+?(?=\W+(?:$|\s))|http\S+)', E'[\\1](\\2)', 'g');

And I end up with this:

  • Cqwertyuioy - Lasgfdf Sddgfdds (Dsajjsa Vsjsjk Osaskgkk Spfs) [link](http:)//www.youtube.com/watch?v=aQjkOmzQ8RT

(Sorry for the anonymization.) So the important part is:

  • Correct: [link](http://www.youtube.com/watch?v=aQjkOmzQ8RT)
  • Wrong:: [link](http:)//www.youtube.com/watch?v=aQjkOmzQ8RT

I imagine it has something to do with the escape character weirdness in the case of SQL queries and/or the special functions of [] and () vs literal, but I can’t figure it out. Any ideas? Thanks!

I don’t know PostgreSQL, but it works using the rails console:

Post.find_each do |p|
  p.raw.gsub!(/"(.*?)"\:(http\S+?(?=\W+(?:$|\s))|http\S+)/, '[\\1](\\2)')
  p.save
end
2 Likes

Ahh, thanks very much for showing me how to use the Rails console for that. It looks like it’s many orders of magnitude slower than running a raw SQL command (which makes sense), so I’ll report back if/when the command finishes.

Well, I ended up aborting the Rails process and went back to the pg console. I also received a suggestion for a much more simple (and hopefully more easy to debug) regexp that also works very well outside of the pg console:
"(.*?)":(\S*)\b

But there’s definitely something non-standard about the SQL query, this did nothing at all to the links:
update posts set raw = regexp_replace(raw, '"(.*?)":(\S*)\b', E'[\\1](\\2)', 'g');
Nor did this:
update posts set raw = regexp_replace(raw, '"(.*?)"\:(\S*)\b', E'[\\1](\\2)', 'g');

So what gives?