Hi there, I have some posts imported from another forum software with line breaks in headings. These are shown correctly in the post preview, but they are not rendered as headings in the final baked post. I’m seeing this issue on my forum updated yesterday and also here on Discourse Meta:
Heading test with line breaks
Normal paragraph.
Heading with no line breaks
It’s leaving the <h2>
blank and is wrapping the text after that in <p>
:
<h2 dir="ltr">
</h2><p dir="ltr">Heading test with line breaks</p>
I’m 99% sure that this is a recent regression, as I specifically checked headings in imported posts from the same user during the migration process, and the user and I both remember that they were rendered correctly.
2 Likes
sam
(Sam Saffron)
April 11, 2023, 1:16am
3
Hmm given the ltr stuff there @Osama could this be related to the CSS flipper we recently added?
2 Likes
Osama
April 12, 2023, 1:56am
4
I don’t think so because the final/cooked HTML of the post is wrong and the CSS flipper (whether the old or the new one) isn’t involved at all in the post cooking process.
^ the text inside the <p>
should be inside the <h2>
and there should be no <p>
at all (I think).
6 Likes
selase
(Selase Krakani)
May 8, 2023, 6:15pm
5
This seems to happen during cooking, specifically markdown parsing.
sanitized = markdown(working_text, options)
doc = Nokogiri::HTML5.fragment(sanitized)
add_nofollow = !options[:omit_nofollow] && SiteSetting.add_rel_nofollow_to_user_content
add_rel_attributes_to_user_content(doc, add_nofollow)
strip_hidden_unicode_bidirectional_characters(doc)
sanitize_hotlinked_media(doc)
add_mentions(doc, user_id: opts[:user_id]) if SiteSetting.enable_mentions
scrubber = Loofah::Scrubber.new { |node| node.remove if node.name == "script" }
loofah_fragment = Loofah.fragment(doc.to_html)
loofah_fragment.scrub!(scrubber).to_html
Passing a header with linebreaks to PrettyText.markdown
returns with the content of header wrapped in a p
tag.
And then during scrubbing, the paragraph get’s extracted out out of the header tag resulting in the following:
I suppose this happens since a paragraph in a header tag isn’t spec compliant?
I’m still reviewing the markdown parser code to figure out why the content is being wrapped in p
tag in the first place.
2 Likes
sam
(Sam Saffron)
May 9, 2023, 12:02am
7
Oh this is very related to:
Took a quick look out of curiosity. It looks like an issue with the Nokogiri library.
[image]
From what uses Discourse here:
Loofah.fragment uses Nokogiri’s HTML4 parser.
This could be fixed using Loofah.html5_fragment as long as Nokogiri >= 1.14.0 and Loofah >= 2.21.0. Discourse already uses Nokogiri::HTML5.fragment; that would make sense.
Note: Loofah 2.21.0 is not yet released; currently in RC1.
I am OK to wait a few more weeks for Loofah to release
Will shoot out a Tweet to Mike to mention this popped up.
2 Likes
Hi there, thanks for looking into this. HTML doesn’t care about line breaks, so technically this is correct:
<h2>
Heading test with line breaks
</h2>
It simply seems to be a problem of the Markdown parser running after/inside the HTML parser, which is generating <p>
tags for the line returns.
1 Like
nat
(Natalie T)
June 20, 2023, 12:40pm
11
Hey there, we’ve fixed this with
discourse:main
← discourse:loofah
opened 02:35AM - 12 May 23 UTC
https://meta.discourse.org/t/markdown-preview-and-result-differ/263878
The re… sult of this markdown had different results in the composer preview and the post. This is solved by updating Loofah to the latest version and using html5 fragments like our user had reported. While the change was only needed in [cooked_post_processor.rb](https://github.com/discourse/discourse/pull/21500/files#diff-67de7f44aa04f02ceba9770e5d83b4465add3bd4297be871f94a2233cd4831a7), I've updated the other areas of our codebase to also use the html5 fragment.
> <strike>
>
> Additionally, how come
>
> ```qml
> import QtQuick
> import QtQuick.Controls 2.15 as QQC2
> import QtQuick.Layouts
> import QtPositioning
> ```
>
> and prepension of `QQC2` before `Action`, `Button`, `ApplicationWindow`, and `Frame` doesn't work, whereas
>
> ```qml
> import QtQuick
> import QtQuick.Controls
> import QtQuick.Layouts
> import QtPositioning
> import QtQuick.Controls 2.15 as QQC2
> ```
>
> and addition of `QQC2.Button` (per https://discuss.kde.org/t/qml-pyqt6-cant-get-native-control-appearance/1240/2?u=rokejulianlockhart) does?
>
> </strike>
<img width="643" alt="Screenshot 2023-05-11 at 3 15 36 PM" src="https://github.com/discourse/discourse/assets/1555215/e7087502-b0a1-4219-be0b-ad3904cc6a6f">
Related:
- ~~https://github.com/discourse/discourse/pull/21500~~
- https://github.com/discourse/discourse-footnote/pull/62
- https://github.com/discourse/discourse-bbcode/pull/50
as seen here:
Heading test with line breaks
Normal paragraph.
Heading with no line breaks
8 Likes
Glad to hear, thanks @nat ! Does this require rebaking posts?
2 Likes
nat
(Natalie T)
June 20, 2023, 1:21pm
13
Yes, it does require a rebake — we can’t do it for you automatically with this fix since it’s an expensive operation.
3 Likes
nat
(Natalie T)
Closed
June 22, 2023, 12:40pm
14
This topic was automatically closed after 2 days. New replies are no longer allowed.