Discourse Archive Tools

I needed a method to archive my Discourse forum on GitHub, but I found that many of the available scripts were outdated or broken. So, I decided to create some simple scripts that meet my requirements. While they’re not perfect and still have a few minor issues, they serve my current needs since I don’t have much more time to spend on this project.

  • Archive new posts as JSON.
  • Render topics to Markdown files.
  • Support for multiple Discourse sites concurrently (one site at a time).
  • Separate metadata tracking per site (last synchronization date and archived post IDs).
  • Concurrent rendering of topics using a thread pool for improved performance.
  • Exponential backoff for HTTP requests to handle rate limits or transient errors.
  • Archive Posts: Saves each Discourse post in a JSON file, organized by creation date.
  • Concurrent Rendering: Renders topics concurrently, converting posts from HTML to Markdown.
  • Image Downloading: Processes HTML to download images and rewrites image URLs to relative paths.
  • Metadata Updating: Keeps track of archived posts to avoid duplicates.
  • Incremental README Updates: Updates a README.md with a table of contents for easy navigation.

Here is an example of Github archived forum:
https[://]github[.]com/c0mmando/forum.hackliberty.org

4 Likes

Your link example has an extra “[” brackets. Removed them for link below.

https://]GitHub - c0mmando/forum.hackliberty.org: Full archive of forum.hackliberty.org