Gsub 运行相同代码产生不同结果

这可能是一个 Stack Overflow 问题,我不理解 gsub 的作用,但这看起来像是 Ruby 的怪异行为,我怀疑是否与 Ruby 镜像有关。不过,我在本地机器的 irb 中得到了相同的结果。

我以为这可能是 dup 的某种我不理解的行为,但如果我定义两次字符串,我也可以复现这种行为。第一次 gsub 未能插入 URL,但后续对相同数据的运行则按预期包含了它。

[1] pry(main)> save=%(<p>On Android, click the three-dot icon in the upper right corner and select Relations from the popup menu. This function wasn't working for me until yesterday, so perhaps on Android, it's still in A/B testing. <UPL-IMAGE-PREVIEW url="https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png">[upl-image-preview url=https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png]</UPL-IMAGE-PREVIEW></p>
[1] pry(main)* </r>
[1] pry(main)* )
=> "<p>On Android, click the three-dot icon in the upper right corner and select Relations from the popup menu. This function wasn't working for me until yesterday, so perhaps on Android, it's still in A/B testing. <UPL-IMAGE-PREVIEW url=\"https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png\">[upl-image-preview url=https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png]</UPL-IMAGE-PREVIEW></p>\n</r>\n"
[2] pry(main)> s=save.dup
=> "<p>On Android, click the three-dot icon in the upper right corner and select Relations from the popup menu. This function wasn't working for me until yesterday, so perhaps on Android, it's still in A/B testing. <UPL-IMAGE-PREVIEW url=\"https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png\">[upl-image-preview url=https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png]</UPL-IMAGE-PREVIEW></p>\n</r>\n"
[3] pry(main)> s.gsub!(/<UPL-IMAGE-PREVIEW url="(.+?)">.+?</UPL-IMAGE-PREVIEW>/i,"\nIMAGEISHERE\n#{$1}\n")
=> "<p>On Android, click the three-dot icon in the upper right corner and select Relations from the popup menu. This function wasn't working for me until yesterday, so perhaps on Android, it's still in A/B testing. \nIMAGEISHERE\n\n</p>\n</r>\n"
[4] pry(main)> s=save.dup
=> "<p>On Android, click the three-dot icon in the upper right corner and select Relations from the popup menu. This function wasn't working for me until yesterday, so perhaps on Android, it's still in A/B testing. <UPL-IMAGE-PREVIEW url=\"https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png\">[upl-image-preview url=https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png]</UPL-IMAGE-PREVIEW></p>\n</r>\n"
[5] pry(main)> s.gsub!(/<UPL-IMAGE-PREVIEW url="(.+?)">.+?</UPL-IMAGE-PREVIEW>/i,"\nIMAGEISHERE\n#{$1}\n")
=> "<p>On Android, click the three-dot icon in the upper right corner and select Relations from the popup menu. This function wasn't working for me until yesterday, so perhaps on Android, it's still in A/B testing. \nIMAGEISHERE\nhttps://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png\n</p>\n</r>\n"

后续运行

s=save.dup
s.gsub!(/<UPL-IMAGE-PREVIEW url="(.+?)">.+?</UPL-IMAGE-PREVIEW>/i,"\nIMAGEISHERE\n#{$1}\n")

按预期产生了 URL 替换。

前几天我也遇到了类似的问题,我在这篇帖子中描述过,但那是之前的编辑。那段代码是:

def fix_slack_posts
  SiteSetting.min_post_length = 2
  reg=/(\\*\\*)(This topic was automatically generated from Slack. You can find the original thread \\[here\\].+?\\))(\\*\\*\\.)?\\s*?([a-zA-Z, ()]* : )(.*)/m
  preg = /([a-zA-Z, ()]+? : )(.*)/m
  topic_posts = Post.where("raw like '**This topic was automatically%'")
  topic_posts.each do |tpost|
    begin
      tpost.raw.gsub!(reg,"#{$5}\\n\\n#{$2}.")
      tpost.save!
      tpost.rebake!
    rescue
      puts "Can't update topic post #{tpost.raw}"
    end
    posts = Post.where(topic_id: tpost.topic_id).where("post_number > 1")
    posts.each do |post|
      if post.raw.gsub(preg,"#{$2}").length>=10
        begin
          post.raw.gsub!(preg,"#{$2}")
          post.save!
          post.rebake!
        rescue
          puts "#{post.id}--cannot save #{post.raw}. "
        end
      end
    end
  end
  SiteSetting.min_post_length = 10
end

发生的情况是,在循环的第二次迭代中,tpost.raw 获取了上一次迭代中最后一个 post.raw 的值。

我以为自己要疯了,但我确实见过这种情况。

这是一个更简单的异常现象。这个版本中,gsub 在第一次运行时不起作用,但在第二次运行时有效:

[1] pry(main)> s=%(<p>在 Android 上,点击右上角的三点图标,并从弹出菜单中选择“关系”。此功能直到昨天才对我生效,因此在 Android 上可能仍处于 A/B 测试阶段。 <UPL-IMAGE-PREVIEW url="https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png">[upl-image-preview url=https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png]</UPL-IMAGE-PREVIEW></p>
[1] pry(main)* </r>
[1] pry(main)* )
=> "<p>在 Android 上,点击右上角的三点图标,并从弹出菜单中选择“关系”。此功能直到昨天才对我生效,因此在 Android 上可能仍处于 A/B 测试阶段。 <UPL-IMAGE-PREVIEW url=\"https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png\">[upl-image-preview url=https://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png]</UPL-IMAGE-PREVIEW></p>\n</r>\n"
[2] pry(main)> s.gsub(/<UPL-IMAGE-PREVIEW url="(.+?)">.+?<\/UPL-IMAGE-PREVIEW>/mi,"\nIMAGEISHERE\n#{$1}\n")
=> "<p>在 Android 上,点击右上角的三点图标,并从弹出菜单中选择“关系”。此功能直到昨天才对我生效,因此在 Android 上可能仍处于 A/B 测试阶段。 \nIMAGEISHERE\n\n</p>\n</r>\n"
[3] pry(main)> s.gsub(/<UPL-IMAGE-PREVIEW url="(.+?)">.+?<\/UPL-IMAGE-PREVIEW>/mi,"\nIMAGEISHERE\n#{$1}\n")
=> "<p>在 Android 上,点击右上角的三点图标,并从弹出菜单中选择“关系”。此功能直到昨天才对我生效,因此在 Android 上可能仍处于 A/B 测试阶段。 \nIMAGEISHERE\nhttps://somehost.s3.eu-central-1.amazonaws.com/2021-08-29/1630236738-85280-screen-shot-2021-08-29-at-83023-pm.png\n</p>\n</r>\n"

$1 这样的变量是在执行正则表达式匹配之后才被填充的。简化您的示例:

s.gsub('original(value)', "replacement#{$1}")

表达式 "replacement#{$1}" 会在调用 gsub 函数之前被求值。因此,$1 将保留自某个之前的正则表达式任务。(这就是为什么您的第二次尝试有效——它使用了第一次尝试中的 $1

有几种选项可以解决此问题。gsub 具有多种功能。

我倾向于向 gsub 传递一个代码块。该代码块在正则表达式匹配之后被求值,因此 $1 会按您的预期工作:

s.gsub('original(value)') { |match| "replacement#{$1}" }

或者,您可以使用 gsub 的“反向引用”功能。我不太喜欢这种语法,但它确实有效。与其使用 "replacement#{$1}",您可以使用 'replacement\1'(或 \2\3 等)。

s.gsub('original(value)', 'replacement\1')

天哪,谢谢。我还以为我要疯了。

太棒了。

我一直对这些反斜杠感到好奇。五年过去了,我终于明白了。

这完全是一个 Stack Exchange 上的问题。:man_shrugging:

万分感谢。