Discourse is feeding js to archive.org again

I archived https://meta.discourse.org/t/add-category-to-top-menu-so-that-category-can-be-landing-page/101668 into https://web.archive.org/web/20181113043546/https://meta.discourse.org/t/add-category-to-top-menu-so-that-category-can-be-landing-page/101668. The resultant page is blank, but “view source” shows some content loaded. A quick glance at the console reveals tons of js errors. I archive the pages through open('https://web.archive.org/save/'+document.location), which should be the same as the click-to-archive function.

The crawler detection seems to still work with google’s bot. archive.org has not significantly changed its UA according to User Agent: Learn Your Web Browser’s User Agent Now - WhoIsHostingThis.com, so I have no idea what’s wrong this time. (I did try to peek at access.log for hints, but for some reason I am only seeing my UA GETting the pages around the archived timestamps on my instance… The robots.txt crawls by archive’s UA are present in the logs though.)


Discourse also feeds js to archive.is, which should’ve been fine because it does run the js into a final DOM and does not use a special UA, so there’s no detection for it anyway. I am ready to file bugs to them for failing to run the JS though…

I’m pretty sure this has never worked? See existing topic.

Closing this as we don’t need a dupe.

1 Like