Any updates on the best way to create a HTML archive of a static site?

Update!

This might be the answer:

I looked at:

Improving Discourse static HTML archive.

It’s old.

I’m going to retire https://forum.talksurf.com/.

Yes, I’m going to archive a backup.

But what if I just want some browseable HTML files?

Should I just run ArchiveDiscourse/archive-discourse.py at master · kitsandkats/ArchiveDiscourse · GitHub ?

Or is there something better?

Thanks in advance!

CC: @pfaffman

Aloha,

Justin

Would something like Wayback Machine be similar?

This worked. I had to make a slight code update.

2 Likes

But not much older than your Discourse version!

I have had some luck mirroring sites with wget. Something like

wget --mirror --page-requisites --convert-links --adjust-extension --compression=auto --reject-regex "/search" --no-if-modified-since --no-check-certificate --execute robots=off --random-wait --wait=1 --user-agent="Googlebot/2.1 (+http://www.google.com/bot.html)" --no-cookies --header "Cookie: _t=$COOKIE" https://forum.talksurf.com/

But you need to get the cookie named _t

Send me an email and I’ll see what I can do.