Make Discourse play nice with the Wayback Machine

ibnesayeed · December 2, 2016, 10:56pm

There are many efforts of headless browser based archiving including http://archive.is/ which is an on-demand single page archiving system. It renders the page using PhantomJS and then archives the rendered DOM plus necessary assets. However, doing it on a massive scale (not just for on-demand pages) takes a lot of time, because PhantomJS or any other renderer is orders of magnitude slower than traditional vanilla crawlers such as Heritrix that is used my Internet Archive and many other web archives.

Here is a relevant research work on the topic Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations. Below is a blog post summarizing the research work and related resources.

Topic		Replies	Views
A basic Discourse archival tool Dev	24	13986	April 30, 2025
Discourse is feeding js to archive.org again Bug	2	855	November 13, 2018
Discourse not loading on legacy browsers Bug	56	5010	May 16, 2022
Unfortunately I had to pull the plug Community	98	11170	December 24, 2022
Embed Discourse comments on another website via Javascript Integrations embedding , how-to	112	297822	July 7, 2025

Make Discourse play nice with the Wayback Machine

Related topics