Markdown rendering issue with image surrounded with HTML

Yes, if you went fully in that direction you’d have to trade off not being able to edit content in Discourse with complete HTML integrity.

I was thinking there may be some middle ground where you just apply this approach to img tags in imported HTML posts.

But I’m second-guessing myself now. Doing this selectively would require changes in many parts of post processing.

1 Like

Yeah, I think this is probably the best option. I did make a start on it back in June 2020, but it ended up being a lot of work, and I had to move on to other projects. I had a couple of approaches to allowing upload:// URLs in <img tags… neither is perfect. From my notes:


Implementation 1:

In the markdown pipeline, parse the content of each html_block (by slightly abusing the xss.js library), and process any image tags with upload:// src attributes.

Pros: all in the markdown pipeline, only does this processing on html_block tokens

Cons: kinda misusing the xss.js sanitizer. It might not be a perfect HTML5 parser

This option could be improved by using a standards-compliant javascript DOM implementation (e.g. jsdom) on the server, but that seems pretty heavyweight.

Implementation 2:

Allow upload:// src attributes all the way through the markdown pipeline, then replace them later. On the client, this is actually pretty simple - we were already replacing upload:// urls asynchronously after cooking. On the server, this does an extra processing step using Nokogiri.

Pros: parser is HTML5 standards compliant

Cons: different implementation on client/server, makes pipeline slightly more complex


I think option 2 is probably the way to go. We’ll then need to update the pull_hotlinked_images job to maintain <img tags, without replacing them with Markdown. I hope I can find time to get back to this soon :crossed_fingers:

3 Likes

I’m really not understanding the complication here. Clearly, the HTML image tag is being replaced with markdown – e.g. ![](upload://6zqK52dO23i1JsYH2oyMU12U2ro.jpeg). Why not just make that include two carriage returns before the ! ? That’ll make it render right, and allow the image upload feature to work to prevent broken images and cross-site issues.

Is there a real-world non-theoretical situation where that whitespace might cause a problem? Is that problem worse than the “images are just broken all of the time” state of the current plugin?

@david, you note that “the newline solution will probably not happen” because the “key thing for us is to maintain the integrity of the content”. But the content is already changed by inserting the tags in the first place. I… really don’t get how this is possibly better.

Right now every time someone includes an image in their wordpress post, the images come back broken, and I am frequently having to deal with “images are broken” comments, now often followed by “yes, it’s bceause discourse sucks” replies. I would like to avoid both of these things.

I understand that the “don’t download images” setting may be a workaround, but actually I do want images to be downloaded, so hopefully that can just be a temporary fix.