Took a quick look out of curiosity. It looks like an issue with the Nokogiri
library.
From what uses Discourse here:
doc = Nokogiri::HTML5.fragment(sanitized)
add_nofollow = !options[:omit_nofollow] && SiteSetting.add_rel_nofollow_to_user_content
add_rel_attributes_to_user_content(doc, add_nofollow)
strip_hidden_unicode_bidirectional_characters(doc)
sanitize_hotlinked_media(doc)
add_mentions(doc, user_id: opts[:user_id]) if SiteSetting.enable_mentions
scrubber = Loofah::Scrubber.new { |node| node.remove if node.name == "script" }
loofah_fragment = Loofah.fragment(doc.to_html)
loofah_fragment.scrub!(scrubber).to_html
end
def self.strip_hidden_unicode_bidirectional_characters(doc)
return if !DANGEROUS_BIDI_REGEXP.match?(doc.content)
doc
.css("code,pre")
.each do |code_tag|
next if !DANGEROUS_BIDI_REGEXP.match?(code_tag.content)
Loofah.fragment
uses Nokogiri’s HTML4 parser.
This could be fixed using Loofah.html5_fragment
as long as Nokogiri
>= 1.14.0 and Loofah
>= 2.21.0. Discourse already uses Nokogiri::HTML5.fragment
; that would make sense.
Note: Loofah
2.21.0 is not yet released; currently in RC1.
11 Likes