EDIT: mistake in my original post. I meant to refer to the cooked field, not the raw field (CORRECTED).
I recently acquired some JSON data from a discourse forum, where the post data is in the “cooked” form. I was wondering if there is anyway to convert this back to the Markdown? I am new to Discourse and have searched but can’t find a way to do this. Seeing as the cooked data appears to be used to create the HTML, I am guessing an alternative route would be to use the function that converts cooked to HTML then convert the HTML to Markdown.
If by raw you mean a field called raw, then you’re looking at the actual Markdown source that we store. For an example, this is the JSON endpoint for you last post just now.
The raw field there is the actual text you composed in the Markdown editor, and we store it as-is so it doesn’t get more pure than that.
Instead, if you generally mean “the raw HTML” as scraped without using JSON endpoints, then you can turn that HTML into Markdown externally with pandoc as suggested above, or any other software.
Please accept my apologies, I made a mistake in my first post (since corrected). I meant to refer to the cooked data as opposed to the raw data (it’s been a long day…).
What form is the cooked data in and is there any way to convert it to Markdown or HTML? Thanks.
Ah, that makes more sense. The cooked field is the HTML rendered from Markdown.
You can simply run that through pandoc to get Markdown; you won’t get full fidelity to the corresponding raw because there are some non-standard Markdown tags like [quote] which get rendered to certain HTML patterns, but if you simply need the content as Markdown, pandoc should work well enough.
#<Post:0x00007fbb78416f50
id: 2203,
user_id: -4,
topic_id: 590,
post_number: 6,
raw: "@merefield, it looks like @eloy has mentioned that their favourite colour is red!",
cooked:
"<p><a class=\"mention\" href=\"/u/merefield\">@merefield</a>, it looks like <a class=\"mention\" href=\"/u/eloy\">@eloy</a> has mentioned that their favourite colour is red!</p>",
created_at: Sun, 18 Aug 2024 11:15:32.487912000 UTC +00:00,
updated_at: Sun, 18 Aug 2024 11:15:32.487912000 UTC +00:00,