Markdown rendering issue with image surrounded with HTML

Yeah, I think this is probably the best option. I did make a start on it back in June 2020, but it ended up being a lot of work, and I had to move on to other projects. I had a couple of approaches to allowing upload:// URLs in <img tags… neither is perfect. From my notes:


Implementation 1:

In the markdown pipeline, parse the content of each html_block (by slightly abusing the xss.js library), and process any image tags with upload:// src attributes.

Pros: all in the markdown pipeline, only does this processing on html_block tokens

Cons: kinda misusing the xss.js sanitizer. It might not be a perfect HTML5 parser

This option could be improved by using a standards-compliant javascript DOM implementation (e.g. jsdom) on the server, but that seems pretty heavyweight.

Implementation 2:

Allow upload:// src attributes all the way through the markdown pipeline, then replace them later. On the client, this is actually pretty simple - we were already replacing upload:// urls asynchronously after cooking. On the server, this does an extra processing step using Nokogiri.

Pros: parser is HTML5 standards compliant

Cons: different implementation on client/server, makes pipeline slightly more complex


I think option 2 is probably the way to go. We’ll then need to update the pull_hotlinked_images job to maintain <img tags, without replacing them with Markdown. I hope I can find time to get back to this soon :crossed_fingers:

4 Likes