HTML <b> across paragraph breaks differs between preview and result


(Dan Fabulich) #1

Continuing the discussion from Multi-line inline spoiler no longer works:

I wrote this as an example:

In the editor preview pane, both paragraphs are bolded, but when I submit the post, only the first paragraph is bolded.

Regardless of what <b> should actually do in this case, the preview pane and the post result should never differ in this way.


(Sam Saffron) #2

I am pretty sure this was raised before in the context of <s>

We would need to add a full on HTML parser and normalizer to the client side or amend our CommonMark parser to handle inline and block level html blocks instead, this is not in our plans at all.


(Dan Fabulich) #3

I just want to make it clear, this isn’t just a special-case one-off fix required for the Discourse CommonMark parser. My understanding is that Discourse is using the off-the-shelf CommonMark library. If it’s not rendering unbalanced HTML tags per the HTML parsing specification, then that’s a bug in the CommonMark library.

And, more generally, if the CommonMark specification calls for “normalizing” unbalanced tags by closing the tag without reopening it, then the CommonMark specification is in violation of the HTML parsing spec, too.


(Sam Saffron) #4

The CommonMark library generates unbalanced HTML, the nokogiri library normalizes HTML and ensures that tags are balanced.

CommonMark spec has close to zero understanding about tags and special handling for lines that start with a tag. One day we may drop nokogiri. But I doubt @codinghorror is super keen to give end users the :dagger: to make arbitrary Discourse pages in the wild fail in HTML validators, just cause that is what end users want.


(Dan Fabulich) #5

I just filed a bug on nokogiri.

Good news! You can use nokogumbo instead, a drop-in replacement for nokogiri that follows the HTML5 parsing specification.

Switching to nokogumbo fixes the misnested <b> issue in my tests.

require 'rubygems'
require 'nokogumbo'
html_doc = Nokogiri::HTML5("<!DOCTYPE html><html><body><p>1<b>2<i>3</b>4</i>5</p></body></html>")
puts html_doc.to_html
<!DOCTYPE html>
<html>
<head></head>
<body><p>1<b>2<i>3</i></b><i>4</i>5</p></body>
</html>

Concretely, BBCode in MarkDown like [b] and [quote] should just be converted to <b> and <blockquote>; then let nokogumbo normalize it.

[spoiler] might require a bit more hackery. If you replace the <i> in that sample with a <span>, the 4 is not included in its own <span>, because <span> is not a “formatting element” per the spec, so it doesn’t participate in the “adoption agency algorithm.” But if you use a <font> element instead of a <span>, it triggers an adoption, and it works great.

Whaddya say?? :grin:


Spoiler with text on the same line as the tag no longer works
(Sam Saffron) #6

No, I just disagree with this

I do not like [quote] you should allow me to type 

anything into the white box [/quote] markup

I just don’t get the drama :dromedary_camel: adopt a more readable way of formatting Markdown. It is friendlier to all editors and significantly easier to parse and cope with.

I do not like 

[quote]
you should allow me to type 

anything into the white box
[/quote] 

markup

Also having to pass extra context into every inline I have is going to create edge case central, sorry this is just not going to happen.


(Dan Fabulich) #7

You keep picking on [quote], which is the least useful block-scoped example.

But do you really disagree with this example?

Do you really disagree with [spoiler]hiding content here

and hiding a little more content in the next paragraph?[/spoiler]

Your [quote] example is indeed absurd, but there’s absolutely nothing wrong with the [spoiler] sample. In fact, the editor toolbar itself will cheerfully generate code like that. (Just try selecting a few paragraphs and clicking “Blur Spoiler.”)

Switching to a correct HTML parser fixes an open bug in the preview pane with misnested <s> and <b> tags at almost no cost to you whatsoever.

Also having to pass extra context into every inline I have is going to create edge case central

What extra context? You already have to deal with this to parse HTML correctly, which Discourse is not doing today because nokogiri has bugs. Just use a compliant HTML parser, convert BBCode to HTML, and it will reduce the surface area of the code you have to maintain.


(Jeff Atwood) #8

Good point, we should fix the editor toolbar generating this broken output @eviltrout – but @sam says it is already fixed / working.


Editor toolbar generates invalid multi-paragraph spoilers
(Dan Fabulich) #9

Perhaps, but I think humans will want to type it like that, too. (And why wouldn’t they?)


(Jeff Atwood) #10

We don’t support it, full stop. So they need to desist, or choose different software that better meets their needs.


(Dan Fabulich) #11

To be clear, are you also saying about misnested HTML tags that it’s a feature that Discourse has non-spec compliant behavior?


(Sam Saffron) #12

I mean that I am not investing any more time in this

  • I am not writing a new HTML parser built into our Markdown pipeline that knows what is inline and what is block and adds special rules

  • I am not stopping what I am doing and getting rid of nokogiri, even if I did I would still insist on every post in Discourse having HTML that passes in validators. If a new engine handles it better than fine, but this is not my priority.

  • I am not adding back support for

this is a test [spoiler] i love spoilers

spoiler[/spoiler] test

Type

this is a test [spoiler] i love spoilers [/spoiler]

[spoiler]spoiler[/spoiler] test

or

this is a test

[spoiler]
i love spoilers 

spoiler
[/spoiler]

test

It is a pain to add support for it and keep carrying around context inside inlines, it effectively turns our markdown it extensions from simple and straightforward to nightmare mode

I do not support nightmare mode just to work around your [bbcode] must behave in XYZ manner.


(Jeff Atwood) #13

I think typing specifically this:

Do you really disagree with [spoiler]hiding content here

and hiding a little more content in the next paragraph?[/spoiler]

versus this

Do you really disagree with [spoiler]hiding content here and hiding a little more content in the next paragraph?[/spoiler]

or this

Do you really disagree with [spoiler]hiding content here
and hiding a little more content in the next paragraph?[/spoiler]

or this

Do you really disagree with 

[spoiler]
hiding content here and hiding a little more content in the next paragraph?
[/spoiler]

is unnecessary and unsupported.


(Dan Fabulich) #14

I have to register one more complaint, and then I promise I’ll shut up about this.

There’s an important difference between a single multi-paragraph spoiler and multiple single-paragraph spoilers: when each spoiler is in its own paragraph, the user has to click on each paragraph to unspoil it.

[spoiler]You[/spoiler]

[spoiler]have[/spoiler]

[spoiler]to[/spoiler]

[spoiler]click[/spoiler]

[spoiler]nine[/spoiler]

[spoiler]times[/spoiler]

[spoiler]to[/spoiler]

[spoiler]read[/spoiler]

[spoiler]this.[/spoiler]

It’s not uncommon on our forum for entire posts to be encapsulated in a spoiler. This change degrades the reading experience for those posts.

You

have

to

click

nine

times

to

read

this.


(Rafael dos Santos Silva) #15

Not

if

you

do

this

right?


(Dan Fabulich) #16

I can’t see the Markdown source of what you did. What did you do?


(Jeff Atwood) #17

Just a tag at the top and bottom, you can view any post raw via /raw/{topicid}/{postid}

https://meta.discourse.org/raw/66458/15


(Eli the Bearded) #18

Reply and use the :speech_balloon: icon to “quote full post” and you can see the source on any post easier than messing around with the /raw/ url trick.