Publish Full Post Content: Convert HTML to Markdown in Discourse

I have a knowledge base post type in WordPress and would like to migrate all of these posts to Discourse, to be used with Discourse Docs + forum capabilities.

It seems to me that the easiest way to do this is automatically create Topics in Discourse with WP Discourse. However, when I do this, the topic content is all in HTML, which makes it far harder to edit going forward.

Is it possible to have the plugin convert all of the HTML to Markdown? Or is there a better way to do this?

A related question - any images in the WP post create a link to the file on the wordpress site. If I delete the post and media library files, the links break. Is it possible to fully import the images, as if they had been inserted into Discourse directly?

Finally, is there an automated mechanism for syncing/publishing posts that already exist? I’d rather not have to open each manually.

(I see that if Force Publish is on, I can simply bulk select them all, bulk edit, and update post status to Publish. This changes nothing, but triggers the sync)

The WP Discourse plugin is not designed to handle data migrations. You’ll encounter various issues, including the HTML to markdown conversion you mentioned.

You need to do this via a backend data migration. If you’re hosted with Discourse.org they can handle this for you as part of a hosting package.

If you’re self hosted, you can give this a shot yourself if you’re keen. Discourse has a number of off-the-shelf migration scripts you can use. If you go down that track, and need help post in #dev and I’ll give you some advice.

Alternatively, you can hire someone in #marketplace to do it for you.

Thanks very much for the quick reply!

That makes sense. I’ll post in Dev to learn more about the wordpress migration scripts.

1 Like

For anyone’s future reference, what I ended up doing was a semi-automated process.

I opened each post (about 120 of them), published to Discourse, and then used this Chrome extension to convert the contents to markdown.

MarkDownload - Markdown Web Clipper - Chrome Web Store (google.com)

Then I just copied that markdown output, edited the topic in Discourse and replaced the excerpt with the markdown. I had to change a few settings in the markdown config, but it worked perfectly other than some code blocks and the need to update the URL for any internal links. I’ll also have to keep the media files in my Wordpress Media Library, because that’s where all the image links point to.

It only really worked because the posts had been created with the classic editor. If I converted them to blocks, the markdown output was much worse. Though, I suppose I could have copied from the front-end, but it was better quality when copied directly from Edit Post.