Yeah, I think this is probably the best option. I did make a start on it back in June 2020, but it ended up being a lot of work, and I had to move on to other projects. I had a couple of approaches to allowing upload://
URLs in <img
tags… neither is perfect. From my notes:
Implementation 1:
In the markdown pipeline, parse the content of each html_block (by slightly abusing the xss.js
library), and process any image tags with upload:// src attributes.
Pros: all in the markdown pipeline, only does this processing on html_block tokens
Cons: kinda misusing the xss.js
sanitizer. It might not be a perfect HTML5 parser
This option could be improved by using a standards-compliant javascript DOM implementation (e.g. jsdom) on the server, but that seems pretty heavyweight.
Implementation 2:
Allow upload:// src attributes all the way through the markdown pipeline, then replace them later. On the client, this is actually pretty simple - we were already replacing upload:// urls asynchronously after cooking. On the server, this does an extra processing step using Nokogiri.
Pros: parser is HTML5 standards compliant
Cons: different implementation on client/server, makes pipeline slightly more complex
https://github.com/davidtaylorhq/discourse/commit/2ae1dafeca20fdd8717fbed0eb3b6c0fa0fe0644
I think option 2 is probably the way to go. We’ll then need to update the pull_hotlinked_images job to maintain <img
tags, without replacing them with Markdown. I hope I can find time to get back to this soon