Fix broken images for posts created by the WP Discourse and RSS plugins

There is a case where images published to Discourse through the WP Discourse and RSS plugins can be broken. This can happen when the full post content is published to Discourse with the WP Discourse plugin and the WordPress Classic editor is used for publishing the post. It can also happen with posts pulled to Discourse with the RSS Polling polling when the Truncate the embedded posts Embedding setting is not enabled.

The problem happens when Discourse attempts to download images that have been added to the post. If downloading the remote image results in a markdown image tag wrapped in HTML tags, the image will be broken.

If posts are being published from WordPress, the issue should be solved by switching from using the Classic Editor to using the Block Editor for publishing the posts. If this is not possible, or if it’s not resolving the issue, a workaround for the problem is to prevent Discourse from downloading the remote images.

If know the domains that the remote images are being published from, you can prevent Discourse from downloading these images by adding the domain(s) to the disabled image download domains site setting:

If you are unsure of all domains that are being used, you can prevent Discourse from downloading all remote images by disabling the download remote images to local. Note that disabling this setting could result in broken images on your site. If possible, it is better to only prevent downloading of remote images from specific domains that you control.

5 Likes

May I ask for more details concerning this? All my sites use the Classic Editor, but very few use a plugin to render markdown as input (the plugin space dried up in markdown parsers, so folks reach for Jetpack most times).

Is the case when a markdown parser is used atop of the Classic Editor? :thinking:

The issue happens when HTML in the following form gets posted to Discourse. It’s most likely to occur when a topic is posted to Discourse via the API:

<p><img src="remote-image-domain/..."/></p>

Any outer tags around the image tag will cause the issue, for example <figure><img src="remote-image-domain/..."/></figure>

When discourse attempts to download the remote image, the following markdown would be generated for the first example:

<p>![](upload://6zqK52dO23i1JsYH2oyMU12U2ro.jpeg)</p>

This will create a broken image. It can be fixed manually be editing the Discourse post to:

<p>

![](upload://6zqK52dO23i1JsYH2oyMU12U2ro.jpeg)
</p>

but just preventing Discourse from downloading the remote image with the disabled image download domains site setting is an easier way to fix it.

For posts published from WP Discourse with the Block Editor, the plugin attempts to fix the issue by processing the post with the following code before publishing it to Discourse:

https://github.com/discourse/wp-discourse/blob/master/lib/template-functions.php#L197-L230

It might be possible to implement a similar fix with for the Classic Editor, but with the Classic Editor the WordPress parse_blocks function isn’t available, so the fix would be more complex. My hope is that the issue can eventually be taken care of with changes to the core Discourse code.

3 Likes

Thanks so much Simon! I understand the issue, great explanation. :slight_smile:

1 Like

Hi Simon,
Thank you for making WP Discourse. :slight_smile:

I also had this problem with images. I use this for download images locally and that broke images as you explain above. After that i convert the Wordpress HTML to Markdown and paste the converted to Discourse. It is working fine but it’s manually.
Is that possible to integrate converter to make it automatically when export from Wordpress?

Thank you!

1 Like

If you are using the WordPress block editor for publishing posts, the conversion should happen automatically. If you are using the Classic Editor, you’ll need to manually fix the HTML on Discourse to prevent broken images.

Let me know if you are using the Block Editor, but are still having issues with broken images.

It could be possible to add similar functionality to posts published with the Classic Editor, but the code required to do it would be more complex than what’s being done with the Block Editor.

1 Like

I using the block editor (Gutenberg) but there are some 3rd party plugin installed into it. Maybe that causes the issue with broken images. I use some 3rd party gallery plugins as well on Wordpress.

The gallery plugin could be the cause of the issue. What the WP Discourse plugin is doing is before setting the post content that gets published to Discourse, it looks for any blocks in the post that have their blockName set to core/image or core/gallery. HTML for images in those blocks is rewritten into a form that can be parsed by Discourse.

It seems possible that image plugins used on your site may be using block names that are not being handled. What is the name of the gallery plugin you’re using?

I see… I am using this but i just see now this is already unsupported. So i think i will convert the images back to the default gallery and try to update the Discourse topics. This should be the problem sorry about it.

1 Like

I switched to the Block editor (it has to be done at some point since the classic editor support will end next year), but it didn’t fix the issue. The images were hosted on Facebook.

Are you able to check the image markup on the WordPress post by selecting the ‘Code editor’ from the sidebar? What I’m wondering is what kind of block (if any) the images are in:

The WordPress plugin is using block names to parse the images. If the image isn’t in a block that the plugin is currently handling, its markup won’t be cleaned up.

1 Like

The WP post was a copy-paste from Facebook, here’s a sample of the HTML code.
The images were image emojis:

<div dir="auto"><span class="pq6dq46d tbxw36s4 knj5qynh kvgmc6g5 ditlmg2l oygrvhab nvdbi5me sf5mxxl7 gl3lb2sf hhz5lgdu"><img src="https://static.xx.fbcdn.net/images/emoji.php/v9/t34/1/16/1f914.png" alt="🤔" width="16" height="16"></span>Comment ? Vous avez 1 mois pour nous envoyer vos plus beaux poèmes et/ou dessins sur le thème du monocycle, ce qu’il vous évoque, votre passion pour ce sport, etc.</div>

I don’t have the same sidebar as you in the block editor, so I displayed the block HTML content with this option:

If the issue happens because it’s not “regular” WP content but a HTML copy paste, that’s not an issue. I’ll tell my users to avoid copy pasting images, even emojis. :slight_smile:

1 Like

Yes, I think the issue here is that the HTML was copied into the WordPress post. The WP Discourse plugin should be able to handle images that are added through an image block. It’s not setup to fix the HTML for images that are added in any other way.

Ideally, DIscourse would be able to handle HTML image tags that are wrapped in other HTML tags, but it’s a tricky problem. Possibly the WP Discourse plugin can be updated to handle images that are added outside of image blocks. My hope was that dealing with image blocks would cover most cases, but there seem to be a lot of exceptions to that.

3 Likes

Hello,

I have read through this topic and the other main topic dealing with images.

Publishing an excerpt from my site to Discourse works perfectly. However, when I click the Show Full Post button, it seems to enter a loading loop and never loads the full post (or does anything else for that matter).

If I try to publish the full post to Discourse, that also works with a few quirks:

  1. images don’t load (which is how I found these topics);
  2. it loads the full post (I have several buttons/links within each post’s contents that may confuse the plugin), but it also loads a perfectly formatted post excerpt at the end of the full post for some reason. In other words, it loads the full post (minus the images), and then loads another excerpt of the same post at the bottom of the post.

One thing to note: my WP site is in a staging mode and is not HTTPS. My Discourse site is HTTPS. I thought that the loading of a the full post might be the issue with the staging site, but other things seem to work (ex. forcing category updates).

I understand this is a complex issue. Having a plugin format someone else’s posts when we are all doing different things would be incredibly challenging and I think the Discourse team has done an admirable job. I am just trying to find a workaround that is as simple as possible. Perhaps just Oneboxing the link to the post? At least the post (o a link to it) would be on the Discourse site, but there would not be any coordinated back and forth.

Thanks for any suggestions anyone may have.

The problem may be that Discourse is not able to find any content on the WordPress page. I wouldn’t expect this to cause a loop though. Discourse should just fail silently if it can’t find any content on the page. Just in case you are testing this with a post that has no real content, try creating a post with some actual text content and see if that makes a difference. You might also want to look into How to configure the allowed embed selectors setting. The allowed embed selectors setting can be used to help Discourse find the page’s content.

Are you using the Block Editor to publish your WordPress posts? If so, how are the images being added to the posts? Are you using a plugin that adds custom image blocks?

Does the post excerpt that it loads include the post’s images?

I wouldn’t expect the excerpt to be automatically loaded here. What I would expect is for the “Show Full Post” button to be displayed. Clicking that button should load the excerpt. When full post content is published from WordPress to Discourse, you can prevent the “Show Full Post” button from being displayed by disabling the Discourse embed truncate site setting.

This might be a good solution for you. Have a look at WP Discourse template customization for details about how to customize the template that’s used to publish posts. There is an example template for publishing the post as a onebox here.

Hello,

Thank you, Simon, for your very quick reply.

I have worked through your suggestions but have not been able to change the outcome which is probably more of an issue with my Discourse skills than anything else.

Your first suggestion was to ensure the post had some content. The posts did have real content so I don’t think that was the problem. You also suggested I work through the How to Configure the allowed embed selectors setting, which I also did and have not yet noticed any difference. I kept this setting very simple and even included the

tags and one other (fairly generic) CSS class, but when I click Show Full Post, it still just says Loading.

I am using the Block Editor. I don’t have any specific image plugins, but I do use Genesis Blocks. However, the image that Discourse is trying to pull in is the Featured Image for the post, which I believe is vanilla Wordpress.

As for your question “Does the post excerpt that it loads include the post’s images?” - No, the post does not include any images. However, if I click on where the image should be, it links me back to the original article.

I will look further into the Oneboxing option. Thank you for the links to that topic as well.

That might be the cause of the issue with that image. I’ll look into that.

I’m not sure what could be going on with this. If your Discourse site is public, can you share a link to a topic that has that issue? You can send it to me in a PM if you prefer.

Thanks again for this. The site is still staging and is not public. Could this be the problem?

Also, for some reason the Oneboxing doesn’t work. If I paste a link from, for example CNN.com, the Oneboxing works great, but from my staging, non-https, site, it just pastes a link. Is that because the staging site is not secure?

Having the Discourse site private should not cause a problem. Are you hiding the WordPress site in some way? If the WordPress site is blocking the requests from Discourse to get the full post content, that would cause an issue.

The Discourse site is public. I would love to hide it during development, but I couldn’t figure out how to stage it. I have a main WP site that is in staging/development, and that site is private/hidden.

I am able to force a category update from the WP plugin. I assume, probably incorrectly, that meant Discourse could contact the main site even though the main site was staging.