Run-together markdown formatting mixed with html makes images not load

Here’s a snippet of a post created on the Fedora Project discussion site from our Community Blog:

<p>Tell us what you think. Would you use this as your homepage now that it has a search engine field? How do you think we can further improve it?</p> ![](upload://na9g3dGvhEU753JEnGrz8xER4XS.png)<p class="has-text-align-center">OR</p> ![](upload://def2zSzNAJtyuOorvTI2eV7rGW1.png)<p>In you are interested in seeing more, check out the <a href="https://discussion.fedoraproject.org/t/how-do-you-feel-about-the-new-design-of-start-fedoraproject-org-page/28689">draft on Figma</a>. </p>

See how those markdown image lines don’t actually start on new lines? This makes the images not display. If I edit the message and add a carriage return before each ![], the images display properly.

Is there something I can change in our setup to fix this? Or is this just a bug in the plugin?

Hey @mattdm,

Could you share a link to the original WP post and the Discourse post with the excerpt if they’re published publicly?

The WP Discourse plugin essentially just passes whatever HTML it finds in the WP post to Discourse, which then processes it for inclusion in the Discourse post. So how HTML is displayed in snippets of Wordpress posts in Discourse comes down two things

  1. The original structure of the HTML in Wordpress (for example)
  2. How Discourse’s markdown parser parses HTML for display in Discourse posts (for example).

So it’s either going to be Wordpress HTML structural question, which is essentially something outside of WP Discourse or Discourse itself, or largely a question of opinion on what a HTML parser should do, albeit sometimes there’s a clear improvement that can be made on that front.

That said, it’s sometimes fruitful to explore the parsing weeds here, so if you have more details about HTML structure of the WP Post, do share and I’ll investigate further :slight_smile:

I had a similar problem on this discourse post, created by the RSS polling plugin (not the wordpress plugin):

The ™ in that first line was actually an HTML <img> tag, and the RSS plugin did bring it across correctly. However, the post was broken when Discourse did its “download local copies of images” step:

1 Like

Looks like @mattdm’s problem was the same:

2 Likes

@simonk Thanks for that useful investigation. Both of you could address the issue by entering your WP domains in the site setting disabled image download domains.

If you’re looking to go deeper into the issue, i.e. trying to handle this in some automated fashion, the difference between @mattdm’s case and yours points to the complexity there. He’s expecting the browser’s treatment of HTML <p> and <img> elements to be carried over in the form of a line break in the markdown, whereas you’re expecting almost (not quite) the opposite, i.e. for the image to be still displayed inline with the same dimensions as the existing <img> in the original HTML.

For more on this aspect of the issue you can see the existing posts on this, e.g.

2 Likes

Thanks @simonk – yes, that’s exactly it.

@angus, what’s the consequence of this? No images at all, or the images cross-linked to the originals on the wordpress site?

I hope we can all agree that something which causes Discourse to show raw markdown rather than rendering it can’t possibly be right, though — especially when that markdown _was created by the plugin. Am I missing something here?

The images will be hotlinked to the originals.

Sorry, I’m confused, I thought your concern about the linebreak? Are you saying the published version (i.e. when you’re just reading, not editing) contains raw markdown, and that’s your concern?

The plugin just sends the raw HTML. The markdown is generated when the post is processed in discourse.

Give the disabled image download domains a shot and see how you go.

Because of the lack of a linebreak, when Discourse displays the page — to users, not editors —, instead of showing images where a ![](upload://def2zSzNAJtyuOorvTI2eV7rGW1.png) appears, you just literally see that text. A human needs to intervene and edit the post once it’s auto-posted and add line-breaks before each image instance in order for them to actually show up.

I can do that, but I’d prefer the images to be downloaded – that’s a nice feature if the other site is down, if the blog changes, or etc. And we’re not near our hosting capacity so I’m not concerned about the space.

I see, thanks for clarifying. Honestly, this is unlikely to change anytime soon for the reasons discussed in the topic I linked above. For example

The key thing for us is to maintain the integrity of the content, so the newline solution will probably not happen.

The problem is that if you were to try and autocorrect for the issue you’re describing it would potentially create other issues. The current solution here is to add your wordpress domain to the disabled image download domains site setting.

edit I’ve made a small proposal on that front, but it’s speculative and I would reiterate that the current solution here is to disable image downloads for your WP domain.

I’m really not understanding the fundamental thing here. How is the integrity of the content being maintained by the current behavior? It’s clearly not.

It seems like if you’re going to replace HTML with markdown, replacing it with markdown which correctly renders is… the only right answer.

It’s not like we’re doing anything fancy on the Wordpress side – it’s just the normal block editor and people adding images in the normal Wordpress way.

I’ll go ahead and use the “disable image download” setting to see if that helps us out, but I’d really, really prefer for this to just work.

I understand where you’re coming from, and I’m not saying that that wouldn’t be ideal, but making a markdown-driven discussions engine also a perfect HTML rendering engine is not a simple proposition, even if individual cases seem superficially simple (i.e. just add a new line).

The ability to render the full HTML of a blog post in Discourse is a nice feature, but at the end of the day it’s one feature in a system that’s focused on markdown-formatted discussions.

If you follow the discussion I linked, including David’s recent post, you’ll see there are some possible avenues we could take here. One may pan out, and it will address this particular case. If there were an easier solution, I’d definitely be making a PR to Discourse address it.

Do let me know if you have issues with disabling image downloads, and I’ll work with you to address that too. I’ll let you know if we find a feasible path forward on the technical front.

2 Likes

Thanks! I appreciate it!

I’ve updated the setting and I’ll report how it works.