A basic Discourse archival tool

mcmcclur · May 12, 2017, 8:20pm

It seems that it’s pretty tricky to save an entire discourse site to a static version. According to this post by Jeff Atwood, it’s “much harder than you’d think”. It doesn’t appear that this is a priority for the Discourse team, either, which is perfectly understandable.

For my purposes, though, I found that I really needed some way to generate basic, static HTML versions of my Discourse sites. I’ve been using Discourse for a couple of years now as a discussion board when teaching my college math classes so, every few months, I retire one or two sites and start one or two more. Obviously, the discussions on the retiring sites have value so I really needed some way to save them. Ultimately, I figured I’d build my own tool.

The basic idea is simple: Rather than scan the HTML and use the HTTP protocol to crawl the site, I figured I’d use the Discourse API to crawl the site. You can view the result of applying the tool to this Discourse Meta on my webpage.

Before looking at it though, please temper your expectations. I’m a college math professor, not a professional web developer. And, while I’d like it to look pretty nice, I’m mainly interested in simplicity. My guess is that most folks here would consider this to be proof of concept, rather than a serious, working tool. Taking that into account, here are some features/limitations:

The code grabs the site logo and places it in a fixed banner at the top. If no site logo is found, it uses the site logo at the top of meta by default.
It uses the API to grab the topic list and generates a new page for each topic. You can limit the number of times you respond to more_topics_url.
There is a single main page that links to those topics.
MathJax is important for my needs so every page loads and configures MathJax.
There is no other JavaSciript and no other plugins are considered.
There are no user pages or category pages.
It’s not very configurable without messing with the code directly.

In spite of all the limitations, it’s sufficient for my needs and I’m rather happy with it. I have no particular plans to expand it, other than incrementally as needed. If anyone is interested, the code (which is Python) is available here:

Perhaps, someone will push it further or just be inspired by the idea?

Topic		Replies	Views
Make Discourse play nice with the Wayback Machine feature	49	10868	June 2, 2020
Is anyone working on a Discourse Wiki? feature	41	15477	May 15, 2020
Discourse Version 3.1 releases	2	6260	August 1, 2023
Archive an old forum "in place" to start a new Discourse forum developers migrations , rewrite-pending	0	18410	March 5, 2014
Discourse not loading on legacy browsers bug	56	4451	May 16, 2022

A basic Discourse archival tool

Related Topics