Discourse is feeding js to archive.org again

Artoria2e5 · November 13, 2018, 4:58am

I archived https://meta.discourse.org/t/add-category-to-top-menu-so-that-category-can-be-landing-page/101668 into https://web.archive.org/web/20181113043546/https://meta.discourse.org/t/add-category-to-top-menu-so-that-category-can-be-landing-page/101668. The resultant page is blank, but “view source” shows some content loaded. A quick glance at the console reveals tons of js errors. I archive the pages through open('https://web.archive.org/save/'+document.location), which should be the same as the click-to-archive function.

The crawler detection seems to still work with google’s bot. archive.org has not significantly changed its UA according to User Agent: Learn Your Web Browser’s User Agent Now - WhoIsHostingThis.com, so I have no idea what’s wrong this time. (I did try to peek at access.log for hints, but for some reason I am only seeing my UA GETting the pages around the archived timestamps on my instance… The robots.txt crawls by archive’s UA are present in the logs though.)

Discourse also feeds js to archive.is, which should’ve been fine because it does run the js into a final DOM and does not use a special UA, so there’s no detection for it anyway. I am ready to file bugs to them for failing to run the JS though…

codinghorror · November 13, 2018, 5:24am

I’m pretty sure this has never worked? See existing topic.

Closing this as we don’t need a dupe.

Topic		Replies	Views
Make Discourse play nice with the Wayback Machine Feature	49	11606	June 2, 2020
A basic Discourse archival tool Dev	24	14042	April 30, 2025
Any updates on the best way to create a HTML archive of a static site? Community	8	124	July 15, 2025
Error: Something went wrong - "Error while processing route: discovery.categories" Support	16	1265	October 5, 2021
Archiving an inactive discourse forum Support	6	1144	January 28, 2022

Discourse is feeding js to archive.org again

Related topics