A basic Discourse archival tool

Archival tool updated with Codex May 2026

It seems that it’s pretty tricky to save an entire discourse site to a static version. According to this post by Jeff Atwood, it’s “much harder than you’d think”. It doesn’t appear that this is a priority for the Discourse team, either, which is perfectly understandable.

For my purposes, though, I found that I really needed some way to generate basic, static HTML versions of my Discourse sites. I’ve been using Discourse for a couple of years now as a discussion board when teaching my college math classes so, every few months, I retire one or two sites and start one or two more. Obviously, the discussions on the retiring sites have value so I really needed some way to save them. Ultimately, I figured I’d build my own tool.

The basic idea is simple: Use the Discourse API to crawl the site, grab the cooked version of each post, and massage that into HTML. The tool focuses largely on my own needs as a college math professor who uses small Discourse forums to support my math classes. As such, mathematical content, like f(x)=e^{-x^2}, should be automatically typeset with MathJax V4 and fenced code blocks tagged as sage are translated to active Sage Cells.

If interested, you can view

Note

The update of the archival tool was performed largely with Codex.