Markdown rendering issue with image surrounded with HTML

angus · August 11, 2021, 4:25am

Yes, if you went fully in that direction you’d have to trade off not being able to edit content in Discourse with complete HTML integrity.

I was thinking there may be some middle ground where you just apply this approach to img tags in imported HTML posts.

But I’m second-guessing myself now. Doing this selectively would require changes in many parts of post processing.

david · August 11, 2021, 11:29am

Yeah, I think this is probably the best option. I did make a start on it back in June 2020, but it ended up being a lot of work, and I had to move on to other projects. I had a couple of approaches to allowing upload:// URLs in <img tags… neither is perfect. From my notes:

Implementation 1:

In the markdown pipeline, parse the content of each html_block (by slightly abusing the xss.js library), and process any image tags with upload:// src attributes.

Pros: all in the markdown pipeline, only does this processing on html_block tokens

Cons: kinda misusing the xss.js sanitizer. It might not be a perfect HTML5 parser

This option could be improved by using a standards-compliant javascript DOM implementation (e.g. jsdom) on the server, but that seems pretty heavyweight.

Implementation 2:

Allow upload:// src attributes all the way through the markdown pipeline, then replace them later. On the client, this is actually pretty simple - we were already replacing upload:// urls asynchronously after cooking. On the server, this does an extra processing step using Nokogiri.

Pros: parser is HTML5 standards compliant

Cons: different implementation on client/server, makes pipeline slightly more complex

https://github.com/davidtaylorhq/discourse/commit/2ae1dafeca20fdd8717fbed0eb3b6c0fa0fe0644

I think option 2 is probably the way to go. We’ll then need to update the pull_hotlinked_images job to maintain <img tags, without replacing them with Markdown. I hope I can find time to get back to this soon

mattdm · August 20, 2021, 9:06pm

I’m really not understanding the complication here. Clearly, the HTML image tag is being replaced with markdown – e.g. ![](upload://6zqK52dO23i1JsYH2oyMU12U2ro.jpeg). Why not just make that include two carriage returns before the ! ? That’ll make it render right, and allow the image upload feature to work to prevent broken images and cross-site issues.

Is there a real-world non-theoretical situation where that whitespace might cause a problem? Is that problem worse than the “images are just broken all of the time” state of the current plugin?

@david, you note that “the newline solution will probably not happen” because the “key thing for us is to maintain the integrity of the content”. But the content is already changed by inserting the tags in the first place. I… really don’t get how this is possibly better.

Right now every time someone includes an image in their wordpress post, the images come back broken, and I am frequently having to deal with “images are broken” comments, now often followed by “yes, it’s bceause discourse sucks” replies. I would like to avoid both of these things.

I understand that the “don’t download images” setting may be a workaround, but actually I do want images to be downloaded, so hopefully that can just be a temporary fix.

david · March 29, 2022, 10:51am

This should be resolved by:

github.com/discourse/discourse

FEATURE: Support upload:// urls in img tags (#16277)

committed 03:46PM - 28 Mar 22 UTC

davidtaylorhq

+128 -54

Previously, our `upload://` protocol urls were only supported in markdown image …tags. This meant that our PullHotlinkedImages job was forced to convert `<img` tags to markdown. Depending on the exact syntax, this can actually cause the image to break. This commit adds support for `upload://` inside regular HTML `<img` tags. In a future commit, we'll be able to use this to make our PullHotlinkedImages job much more robust. Context at https://meta.discourse.org/t/152801

Let’s try it here on Meta.

<div>
Image in an HTML block:
<img src="..." width=100 height=100>
</div>

Image in an HTML block:

david · April 4, 2022, 7:00am

This topic was automatically closed after 5 days. New replies are no longer allowed.

Topic		Replies	Views
Thumbnail generation & markdown rendering issue Support	14	1904	May 30, 2020
Fix broken images for posts created by the WP Discourse and RSS plugins Administrators wordpress , rss-polling , how-to	33	4034	August 20, 2021
Run-together markdown formatting mixed with html makes images not load Support	13	1266	August 20, 2021
Images not publishing to Discourse in WP 5.3 WordPress	26	4036	August 13, 2020
System auto converts images in HTML into markdown, breaking them Support	5	687	July 17, 2022

Markdown rendering issue with image surrounded with HTML

Implementation 1:

Implementation 2:

Related topics