Aha! @simon will upgrading the wordpress plugin fix old posts? Or only new posts?
The fix only applies to posts that are published with the WordPress Block editor. It will fix old posts if the “Update Discourse Topic” button is clicked on WordPress. It will need to be done manually for each post unless someone writes a script to loop through old posts.
Let’s try this HTML:
I think this should be fixed in Discourse as well, just in case posts are made by a non-wordpress integration. To summarise:
<p><img src="..."/></p>. This is perfectly valid HTML
pull_hotlinked_imagesfetched the image, and replaced the markup with
This does not render
So there are two possible fixes here. Either:
- We fix InlineUploads so that it adds a blank line in the markup. This renders fine:
<p> !(upload://6zqK52dO23i1JsYH2oyMU12U2ro.jpeg) </p>
- We fix the markdown engine to render
<p>!(upload://6zqK52dO23i1JsYH2oyMU12U2ro.jpeg)</p>. Note that this does not render in the commonmark demo either.
@sam do you know if there’s a deliberate reason that markdown images won’t render on the same line as a
This is part of the CommonMark spec
Absolutely do not want to deviate from the spec here. I guess we fix pull hotlink images to allow for this by injecting 2 newlines for cases like this. I think it is rather rare though and sort of self inflicted.
I don’t think it’s that rare, especially when Discourse is associated to popular tools such as WP-Discourse or any tools which uses the API.
Please consider to add a blank line. It does not seems that would be a breaking change and fairly easy to implement.
@Arkshine we have been discussing this a lot internally. The key thing for us is to maintain the integrity of the content, so the newline solution will probably not happen.
But we will definitely be doing something - having the pull_hotlinked_images job destroy the images is not acceptable. Hope to have a solution soon
A workaround for this issue is to prevent Discourse from downloading the remote images. This can be done by adding the image domain to the
disabled image download domains site setting. It is also possible to prevent Discourse from downloading all remote images by disabling the
download remote images to local site setting. See Fix broken images for posts created by the WP Discourse and RSS plugins for details.
In our case, we can’t because we’re using the official topic thumbnail component which requires a local image. We solved the issue by adding newlines before any
<img> in the content before topic being created with WP-Discourse. Not a solution for everyone but it works for us. A bit sad Discourse doesn’t support this legal usage.
But yes, if you are not tightened to a plugin/component and/or can’t fix the content before topic is created, this is for sure a reasonable workaround.
We are still planning to fix the issue. Unfortunately it is a problem quite deep in our markdown rendering system which is complex to fix. But we will get to it - sorry it is taking so long!
I’m just adding a note here that the issue affects posts with images that are created via the Discourse RSS plugin as well.
Sorry for the multiple posts in this topic, but the issue also affects images in posts that are created through our Zendesk plugin when the
sync comments from zendesk setting is enabled. The difficulty with this case is that it’s not possible to know the source of the images before hand, so the workaround of adding the image src to the
disabled image download domains setting will not work.
Is there any way we could just prevent remote images from being downloaded to local if the image tag is wrapped in HTML tags?
I am afraid that is completely off the table, if we did something like this we would allow third parties to track usage on a forum by injecting a tracking gif. The download remote image thing is a part security feature.
Instead I think we need a “smarter” system that works similar to the way @tgxworld built our image remappers a few years ago, one that works backwards from HTML and ensures stability of the change by re-cooking. Very complex change sadly.
This issue has cropped up again
Just thinking out-loud here, but I wonder if we can elide the tricky problem here (i.e. the conversion of HTML to markdown). To recap (just to help think this through)
Discourse supports the importation of HTML for the creation of post content (e.g. HTML from WP Discourse).
In some contexts the user expects the integrity of the original HTML to be retained exactly.
“integrity” here has at least two aspects:
- How the content is rendered, e.g. linebreaks
- Where media is hosted, e.g. downloading images to local to avoid broken images, or potentially for security concerns
The conversion of HTML to markdown potentially creates issues for the first type of integrity, however it is currently necessary to ensure the second type of integrity.
So perhaps one way to address this issue for certain imported posts would be for the imported HTML to be stored directly as the cooked post content, and the
pull_hotlinked_images job would support downloading images in such content without converting
img to markdown.
Yes, put more simply, perhaps the code could support downloading hotlinked images without requiring a conversion of the
img to markdown. For such posts you would interpolate the downloaded image url in the cooked content instead of the raw.
The tricky thing is, how would you ever edit posts with that flag. The editor would be in raw html mode and the entire toolbar would be broken, etc.
Yes, if you went fully in that direction you’d have to trade off not being able to edit content in Discourse with complete HTML integrity.
I was thinking there may be some middle ground where you just apply this approach to
img tags in imported HTML posts.
But I’m second-guessing myself now. Doing this selectively would require changes in many parts of post processing.
Yeah, I think this is probably the best option. I did make a start on it back in June 2020, but it ended up being a lot of work, and I had to move on to other projects. I had a couple of approaches to allowing
upload:// URLs in
<img tags… neither is perfect. From my notes:
In the markdown pipeline, parse the content of each html_block (by slightly abusing the
xss.js library), and process any image tags with upload:// src attributes.
Pros: all in the markdown pipeline, only does this processing on html_block tokens
Cons: kinda misusing the
xss.js sanitizer. It might not be a perfect HTML5 parser
Allow upload:// src attributes all the way through the markdown pipeline, then replace them later. On the client, this is actually pretty simple - we were already replacing upload:// urls asynchronously after cooking. On the server, this does an extra processing step using Nokogiri.
Pros: parser is HTML5 standards compliant
Cons: different implementation on client/server, makes pipeline slightly more complex
I think option 2 is probably the way to go. We’ll then need to update the pull_hotlinked_images job to maintain
<img tags, without replacing them with Markdown. I hope I can find time to get back to this soon
I’m really not understanding the complication here. Clearly, the HTML image tag is being replaced with markdown – e.g.
!(upload://6zqK52dO23i1JsYH2oyMU12U2ro.jpeg). Why not just make that include two carriage returns before the
! ? That’ll make it render right, and allow the image upload feature to work to prevent broken images and cross-site issues.
Is there a real-world non-theoretical situation where that whitespace might cause a problem? Is that problem worse than the “images are just broken all of the time” state of the current plugin?
@david, you note that “the newline solution will probably not happen” because the “key thing for us is to maintain the integrity of the content”. But the content is already changed by inserting the tags in the first place. I… really don’t get how this is possibly better.
Right now every time someone includes an image in their wordpress post, the images come back broken, and I am frequently having to deal with “images are broken” comments, now often followed by “yes, it’s bceause discourse sucks” replies. I would like to avoid both of these things.
I understand that the “don’t download images” setting may be a workaround, but actually I do want images to be downloaded, so hopefully that can just be a temporary fix.
This should be resolved by:
Let’s try it here on Meta.
<div> Image in an HTML block: <img src="..." width=100 height=100> </div>
This topic was automatically closed after 5 days. New replies are no longer allowed.