Any updates on the best way to create a HTML archive of a static site?

Update!

This might be the answer:

I looked at:

Improving Discourse static HTML archive.

It’s old.

I’m going to retire https://forum.talksurf.com/.

Yes, I’m going to archive a backup.

But what if I just want some browseable HTML files?

Should I just run ArchiveDiscourse/archive-discourse.py at master · kitsandkats/ArchiveDiscourse · GitHub ?

Or is there something better?

Thanks in advance!

CC: @pfaffman

Aloha,

Justin

Would something like Wayback Machine be similar?

This worked. I had to make a slight code update.

3 Likes

But not much older than your Discourse version!

I have had some luck mirroring sites with wget. Something like

wget --mirror --page-requisites --convert-links --adjust-extension --compression=auto --reject-regex "/search" --no-if-modified-since --no-check-certificate --execute robots=off --random-wait --wait=1 --user-agent="Googlebot/2.1 (+http://www.google.com/bot.html)" --no-cookies --header "Cookie: _t=$COOKIE" https://forum.talksurf.com/

But you need to get the cookie named _t

Send me an email and I’ll see what I can do.

1 Like

I’ve been doing this recently, and this is how I did it。

 def serve
    file_path = File.expand_path(
      params[:path]+"."+params[:format],
      File.join(File.dirname(__FILE__), "../../public")
    )
    if File.file?(file_path)
      send_file file_path, type: "text/html", disposition: "inline"
    else
      render plain: "404 Not Found", status: 404
    end
  end

Just to let you know, this does not pull the images with new URLs. The photos will still point to your server (which is about to be decommissioned!).

Jay kindly sent me the dump, and I compared it to mine.

His technique works better in the sense that it saves the images.

However, his internal links don’t point to the articles, but rather to the decommissioned site. However, the articles can be found with images.

It would be a “nice to have” if Discourse supported a static export. :smile:.

2 Likes

The good thing is that you have all the data, so one could be written that exported the data directly from a backup if anyone had the inclination to do so.

But we’re not likely to write one :wink:

1 Like

It shouldn’t be too hard to fix the internal links, add it looks like they just need .html added.

I thought that the --convert-links would fix those links…