Any updates on the best way to create a HTML archive of a static site?

justin_gordon · July 10, 2025, 11:10pm

Update!

This might be the answer:

I looked at:

Improving Discourse static HTML archive.

It’s old.

I’m going to retire https://forum.talksurf.com/.

Yes, I’m going to archive a backup.

But what if I just want some browseable HTML files?

Should I just run ArchiveDiscourse/archive-discourse.py at master · kitsandkats/ArchiveDiscourse · GitHub ?

Or is there something better?

Thanks in advance!

CC: @pfaffman

Aloha,

Justin

NateDhaliwal · July 10, 2025, 11:19pm

Would something like Wayback Machine be similar?

justin_gordon · July 11, 2025, 12:00am

This worked. I had to make a slight code update.

pfaffman · July 11, 2025, 10:36pm

But not much older than your Discourse version!

I have had some luck mirroring sites with wget. Something like

wget --mirror --page-requisites --convert-links --adjust-extension --compression=auto --reject-regex "/search" --no-if-modified-since --no-check-certificate --execute robots=off --random-wait --wait=1 --user-agent="Googlebot/2.1 (+http://www.google.com/bot.html)" --no-cookies --header "Cookie: _t=$COOKIE" https://forum.talksurf.com/

But you need to get the cookie named _t

Send me an email and I’ll see what I can do.

翔_贺 · July 14, 2025, 1:50am

I’ve been doing this recently, and this is how I did it。

 def serve
    file_path = File.expand_path(
      params[:path]+"."+params[:format],
      File.join(File.dirname(__FILE__), "../../public")
    )
    if File.file?(file_path)
      send_file file_path, type: "text/html", disposition: "inline"
    else
      render plain: "404 Not Found", status: 404
    end
  end

justin_gordon · July 15, 2025, 12:11am

Just to let you know, this does not pull the images with new URLs. The photos will still point to your server (which is about to be decommissioned!).

justin_gordon · July 15, 2025, 12:18am

Jay kindly sent me the dump, and I compared it to mine.

His technique works better in the sense that it saves the images.

However, his internal links don’t point to the articles, but rather to the decommissioned site. However, the articles can be found with images.

It would be a “nice to have” if Discourse supported a static export. .

supermathie · July 15, 2025, 12:26am

The good thing is that you have all the data, so one could be written that exported the data directly from a backup if anyone had the inclination to do so.

But we’re not likely to write one

pfaffman · July 15, 2025, 12:52am

It shouldn’t be too hard to fix the internal links, add it looks like they just need .html added.

I thought that the --convert-links would fix those links…

Topic		Replies	Views
Improving Discourse static HTML archive Feature	5	2027	April 7, 2019
How do I export the complete forum as static html pages? Support	4	2879	May 11, 2022
Archive an old forum "in place" to start a new Discourse forum Migrating to Discourse	0	19432	March 5, 2014
Archiving an inactive discourse forum Support	6	1164	January 28, 2022
A basic Discourse archival tool Dev	24	14171	April 30, 2025

Any updates on the best way to create a HTML archive of a static site?

Related topics