Ok, I’m going to respond to the issues you’re raising here separately. I understand why you’re connecting them, but hopefully you’ll see why they’re separate issues.
HTML entites in plain text email notifications
the nicest thing would be for the email messages to be multi-part with a clean-text rendered markdown
text/plain
and a separatetext/html
This is actually how Discourse email notifications currently work. If you look at the “original” of a Discourse email notification you’ll see there is a text version and a HTML version.
What you seem to be saying, but I’m still not 100% clear on this, is that you’re getting HTML entities in the plain text version of Discourse email notifications, the upshot being that you’re seeing the actual HTML entites in the body of the email when looking at it in an email client that doesn’t support HTML. Is that what you’re saying? Could you share a screenshot of this from an email client (that doesn’t support HTML)?
If this is the case this is an issue specific to Discourse email content generation and formatting and it’d be best to split that off into a more targeted topic in support or bug
HTML in Discourse posts
You’re raising a relevant issue here, but from a technical perspective the question lies with how Discourse approaches imported content more broadly. The current default for imported content is HTML, not markdown.
Other contexts in which you can see this is the RSS Polling plugin, which, like the WP Discourse plugin, imports HTML into the post content. Note also that the embed support markdown
site setting is off by default and all the other site settings dealing with embedded HTML in posts (e.g. allowed embed selectors
).
I’m partly guessing here, but the most likely reason(s) this strategic decision was taken in the early days of Discourse handling imported content was a combination of simplicity and fidelity, i.e. conversions from HTML to markdown will be imperfect. There is one key exception to this which I’ll mention below.
The WP Discourse plugin could attempt to convert the HTML of Wordpress posts to markdown before sending them to Discourse. Yes there are existing PHP libraries that convert HTML to markdown, but it’s never as simple as that when converting a markup language, particularly considering the different flavours of markdown.
Indeed the WP Discourse plugin attempting to handle the conversion would actually be misguided, considering there is already a custom HtmlToMarkdown converter in Discourse. Currently this converter handles the conversion of HTML to markdown in emails imported into Discourse. If the HTML of posts from Wordpress were to be converted to Discourse markdown it would need to be handled by that converter.
Currently the WP Discourse plugin uses the Discourse API to publish posts, i.e. the /posts
endpoint. So essentially what you’re saying is that you want HtmlToMarkdown
converter support to be added to the Discourse /posts
endpoint (i.e. as an optional query param). You could advocate for this and if implemented the WP Discourse plugin would adopt it as an optional setting.