How to iterate over all the topics in order to export them as Markdown?

Hi, folks. I am looking at options for archiving content in a Discourse site. I have found the thread on creating and serving a WARC, which gives me something to work with, but I’d really like to export all the topics/threads/whatever as Markdown.

I read this: Export topic as markdown

Now I’d like to know how to iterate over all the topic URLs so that I could turn them into /raw/ URLs and download all the topic threads as Markdown. Is there some easy way to get a list of all the topic URLs on the site? Do I need to pop open a Rails console? Is there a single Ruby class that can enumerate all the topic URLs? Something?

Many thanks.

I got there, but I would still love to hear about easier paths.

  1. Use Discourse admin to download a backup of the site.
  2. Find the PostgreSQL database dump inside the backup file, then restore that to a local database.
  3. select id from topics, then paste that into a file.
  4. Use sed or any of its cousins to turn the topic IDs into https://my-discourse-site/raw/<topic ID>
  5. for...; do wget $url; done
  1. Retrieve the category json, for example https://meta.discourse.org/c/support.json (which redirects to https://meta.discourse.org/c/support/6.json)
  2. Get the first batch of topic ID’s from the topic_list.topics array in that json.
  3. Retrieve topic_list.more_topics_url and goto #2

Thank you for this. How would I then iterate over the categories?

request /site.json and iterate over the categories array.

1 Like