I’m trying to retrieve specific posts to turn them into a single corpus (e.g., a book) to be processed elsewhere (e.g., with Pandoc).
Discourse provides two readily accessible ways to download specific posts:
.jsonto an URL will give its JSON representation that includes the
- Using the
topic/postnumbers from the original URL and replacing the prefix with
/rawgives the original Markdown version, e.g., from
Using the second approach, I’m listing topic numbers in a file and get the Markdown input. It works fine if there’s no hyperlink or attached files.
Two problems to solve:
- relative links must be turned into absolute links if we want them clickable outside Discourse
upload://links must be converted to absolute links as well, or to relative links if the assets are downloaded.
The JSON file produced in (1) does not seem to allow this conversion (at least not obviously, maybe there’s a way to re-parse the Markdown input and loop against the links list) so I’m wondering if the API can do that, retrieving the original Markdown but with externally usable URLs. In fact I wonder if such a view could be made available in the way
raw is. This would immensely simplify reusing contents outside Discourse, or import contents into another Discourse instance (think [fediverse]).
I could go for the HTML and work from there, but it seems much cleaner to process the Markdown files (also since other sources might use Markdown that we want to compose with). Any suggestions as to the ways to approach this problem and eventual courses of action to understand this issue better or make it work in future versions is welcome!