Gsub producing different results running the same code

Maybe this is a stackoverflow question and I don’t understand what gsub does~~, but this looks like bizarre ruby behavior that I wonder if it could somehow be the ruby image?~~ I get the same results in irb on my local machine.

I thought this might be some behavior of dup that I wasn’t understanding, but I can replicate the behavior if I define the string twice. The first time the gsub fails to insert the URL, but subsequent runs of the same data include it as I expect.

[1] pry(main)> save=%(<p>On Android, click the three-dot icon in the upper right corner and select Relations from the popup menu. This function wasn't working for me u
ntil yesterday, so perhaps on Android, it's still in A/B testing. <UPL-IMAGE-PREVIEW url="https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85
280-screen-shot-2021-08-29-at-83023-pm.png">[upl-image-preview url=https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08
-29-at-83023-pm.png]</UPL-IMAGE-PREVIEW></p>
[1] pry(main)* </r>
[1] pry(main)* )
=> "<p>On Android, click the three-dot icon in the upper right corner and select Relations from the popup menu. This function wasn't working for me until yesterday, so
 perhaps on Android, it's still in A/B testing. <UPL-IMAGE-PREVIEW url=\"https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2
021-08-29-at-83023-pm.png\">[upl-image-preview url=https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.
png]</UPL-IMAGE-PREVIEW></p>\n</r>\n"
[2] pry(main)> s=save.dup
=> "<p>On Android, click the three-dot icon in the upper right corner and select Relations from the popup menu. This function wasn't working for me until yesterday, so
 perhaps on Android, it's still in A/B testing. <UPL-IMAGE-PREVIEW url=\"https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2
021-08-29-at-83023-pm.png\">[upl-image-preview url=https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.
png]</UPL-IMAGE-PREVIEW></p>\n</r>\n"
[3] pry(main)> s.gsub!(/<UPL-IMAGE-PREVIEW url="(.+?)">.+?<\/UPL-IMAGE-PREVIEW>/i,"\nIMAGEISHERE\n#{$1}\n")
=> "<p>On Android, click the three-dot icon in the upper right corner and select Relations from the popup menu. This function wasn't working for me until yesterday, so
 perhaps on Android, it's still in A/B testing. \nIMAGEISHERE\n\n</p>\n</r>\n"
[4] pry(main)> s=save.dup
=> "<p>On Android, click the three-dot icon in the upper right corner and select Relations from the popup menu. This function wasn't working for me until yesterday, so perhaps on Android, it's still in A/B testing. <UPL-IMAGE-PREVIEW url=\"https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png\">[upl-image-preview url=https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png]</UPL-IMAGE-PREVIEW></p>\n</r>\n"
[5] pry(main)> s.gsub!(/<UPL-IMAGE-PREVIEW url="(.+?)">.+?<\/UPL-IMAGE-PREVIEW>/i,"\nIMAGEISHERE\n#{$1}\n")
=> "<p>On Android, click the three-dot icon in the upper right corner and select Relations from the popup menu. This function wasn't working for me until yesterday, so perhaps on Android, it's still in A/B testing. \nIMAGEISHERE\nhttps://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png\n</p>\n</r>\n"

Subsequent runs of

s=save.dup
s.gsub!(/<UPL-IMAGE-PREVIEW url="(.+?)">.+?<\/UPL-IMAGE-PREVIEW>/i,"\nIMAGEISHERE\n#{$1}\n")

produce the URL replacement as I expect.

work as expected.

I was having a similar problem the other day that I described at this post, but in a previous edit. That code was

def fix_slack_posts
  SiteSetting.min_post_length = 2
  reg=/(\*\*)(This topic was automatically generated from Slack. You can find the original thread \[here\].+?\))(\*\*\.)?\s*?([a-zA-Z, ()]* : )(.*)/m
  preg = /([a-zA-Z, ()]+? : )(.*)/m
  topic_posts = Post.where("raw like '**This topic was automatically%'")
  topic_posts.each do |tpost|
    begin
      tpost.raw.gsub!(reg,"#{$5}\n\n#{$2}.")
      tpost.save!
      tpost.rebake!
    rescue
      puts "Can't update topic post #{tpost.raw}"
    end
    posts = Post.where(topic_id: tpost.topic_id).where("post_number > 1")
    posts.each do |post|
      if post.raw.gsub(preg,"#{$2}").length>=10
        begin
          post.raw.gsub!(preg,"#{$2}")
          post.save!
          post.rebake!
        rescue
          puts "#{post.id}--cannot save #{post.raw}. "
        end
      end
    end
  end
  SiteSetting.min_post_length = 10
end

And what was happening is that the tpost.raw on the second iteration of the loop was getting the value of the last post.raw on the previous iteration.

I thought I was going crazy, but I have seen this

Here’s a simpler oddity. This version has gsub not working on the first run, but working on the second:

[1] pry(main)> s=%(<p>On Android, click the three-dot icon in the upper right corner and select Relations from the popup menu. This function wasn't working for me until yesterday, so perhaps on Android, it's still in A/B testing. <UPL-IMAGE-PREVIEW url="https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png">[upl-image-preview url=https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png]</UPL-IMAGE-PREVIEW></p>
[1] pry(main)* </r>
[1] pry(main)* )
=> "<p>On Android, click the three-dot icon in the upper right corner and select Relations from the popup menu. This function wasn't working for me until yesterday, so perhaps on Android, it's still in A/B testing. <UPL-IMAGE-PREVIEW url=\"https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png\">[upl-image-preview url=https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png]</UPL-IMAGE-PREVIEW></p>\n</r>\n"
[2] pry(main)> s.gsub(/<UPL-IMAGE-PREVIEW url="(.+?)">.+?<\/UPL-IMAGE-PREVIEW>/mi,"\nIMAGEISHERE\n#{$1}\n")
=> "<p>On Android, click the three-dot icon in the upper right corner and select Relations from the popup menu. This function wasn't working for me until yesterday, so perhaps on Android, it's still in A/B testing. \nIMAGEISHERE\n\n</p>\n</r>\n"
[3] pry(main)> s.gsub(/<UPL-IMAGE-PREVIEW url="(.+?)">.+?<\/UPL-IMAGE-PREVIEW>/mi,"\nIMAGEISHERE\n#{$1}\n")
=> "<p>On Android, click the three-dot icon in the upper right corner and select Relations from the popup menu. This function wasn't working for me until yesterday, so perhaps on Android, it's still in A/B testing. \nIMAGEISHERE\nhttps://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png\n</p>\n</r>\n"


Variables like $1 are populated after a regex match is performed. Simplifying your example a little:

s.gsub('original(value)', "replacement#{$1}")

The "replacement#{$1}" expression is evaluated before the gsub function is called. So $1 will be leftover from some previous regex task. (That’s why your second attempt works - it takes $1 from the first attempt)

There are a couple of options to solve this problem. gsub has lots of different functionalities.

My preference would be to pass a block to gsub. The block is evaluated after the regex has been matched, so $1 will work as you expect:

s.gsub('original(value)') { |match| "replacement#{$1}" }

Or you can use the ‘backreference’ feature of gsub. I’m not a fan of this syntax, but it does work. Rather than using "replacement#{$1}", you use 'replacement\1' (or \2, or \3, etc.)

s.gsub('original(value)', 'replacement\1')
5 Likes

OMG. Thank you. I thought I was going crazy.

Brilliant.

I’d always wondered about those backslashes. Five years later, I get it.

This totally was a stack exchange question. :man_shrugging:

A million thanks.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.