Archival tool updated with Codex May 2026
It seems that it’s pretty tricky to save an entire discourse site to a static version. According to this post by Jeff Atwood, it’s “much harder than you’d think”. It doesn’t appear that this is a priority for the Discourse team, either, which is perfectly understandable.
For my purposes, though, I found that I really needed some way to generate basic, static HTML versions of my Discourse sites. I’ve been using Discourse for a couple of years now as a discussion board when teaching my college math classes so, every few months, I retire one or two sites and start one or two more. Obviously, the discussions on the retiring sites have value so I really needed some way to save them. Ultimately, I figured I’d build my own tool.
The basic idea is simple: Use the Discourse API to crawl the site, grab the cooked version of each post, and massage that into HTML. The tool focuses largely on my own needs as a college math professor who uses small Discourse forums to support my math classes. As such, mathematical content, like f(x)=e^{-x^2}, should be automatically typeset with MathJax V4 and fenced code blocks tagged as sage are translated to active Sage Cells.
If interested, you can view
- A small portion of Discourse Meta,
- The forum for my Math for Machine Learning class, and/or
- The GitHub Repository.
Note
The update of the archival tool was performed largely with Codex.