I needed a method to archive my Discourse forum on GitHub, but I found that many of the available scripts were outdated or broken. So, I decided to create some simple scripts that meet my requirements. While they’re not perfect and still have a few minor issues, they serve my current needs since I don’t have much more time to spend on this project.
- Archive new posts as JSON.
- Render topics to Markdown files.
- Support for multiple Discourse sites concurrently (one site at a time).
- Separate metadata tracking per site (last synchronization date and archived post IDs).
- Concurrent rendering of topics using a thread pool for improved performance.
- Exponential backoff for HTTP requests to handle rate limits or transient errors.
- Archive Posts: Saves each Discourse post in a JSON file, organized by creation date.
- Concurrent Rendering: Renders topics concurrently, converting posts from HTML to Markdown.
- Image Downloading: Processes HTML to download images and rewrites image URLs to relative paths.
- Metadata Updating: Keeps track of archived posts to avoid duplicates.
- Incremental README Updates: Updates a
README.md
with a table of contents for easy navigation.
Here is an example of Github archived forum:
https[://]github[.]com/c0mmando/forum.hackliberty.org