Context
I’m trying to retrieve specific posts to turn them into a single corpus (e.g., a book) to be processed elsewhere (e.g., with Pandoc).
‘Simple’ Approach
Discourse provides two readily accessible ways to download specific posts:
- Appending
.json
to an URL will give its JSON representation that includes thecooked
HTML version - Using the
topic/post
numbers from the original URL and replacing the prefix with/raw
gives the original Markdown version, e.g., fromhttps://discourse.example/t/some-topic/123/4
tohttps://discourse.example/raw/123/4
Using the second approach, I’m listing topic numbers in a file and get the Markdown input. It works fine if there’s no hyperlink or attached files.
API Approach?
Two problems to solve:
- relative links must be turned into absolute links if we want them clickable outside Discourse
-
upload://
links must be converted to absolute links as well, or to relative links if the assets are downloaded.
The JSON file produced in (1) does not seem to allow this conversion (at least not obviously, maybe there’s a way to re-parse the Markdown input and loop against the links list) so I’m wondering if the API can do that, retrieving the original Markdown but with externally usable URLs. In fact I wonder if such a view could be made available in the way raw
is. This would immensely simplify reusing contents outside Discourse, or import contents into another Discourse instance (think [fediverse]).
I could go for the HTML and work from there, but it seems much cleaner to process the Markdown files (also since other sources might use Markdown that we want to compose with). Any suggestions as to the ways to approach this problem and eventual courses of action to understand this issue better or make it work in future versions is welcome!