Make Discourse play nice with the Wayback Machine

Replay is currently broken, the JS runs and the Ember Router breaks due to the pathname change.

Thanks to the improved browser detection from @david, there is an extremely ugly but also tempting fix to get new captures to render properly: just patch browser-detect so it detects the replay and yanks out the noscript version.

https://github.com/riking/discourse/commit/6a83c83bd3acab37f7a3e24f6aa4a14081bb2249

The problem is, if we start serving that script, and by some miracle the JS starts being able to run, all the old archived pages are forced into the no-js view.

Now that I write that out, you know, that’s probably not too bad of a price to pay for getting working archive playbacks today. (Draft PR) I have been talked out of actually doing this.

3 Likes

(Was @dan, not me)

Is our existing wayback machine bypass broken?
https://github.com/discourse/discourse/blob/cb8f8de422b8b270dc57f3614d3d9d718bfc40ef/lib/crawler_detection.rb#L18-L18

3 Likes

Is there any particular reason we are not checking for their user agent (archive.org_bot)? It seems to be a less fragile solution.

https://archive.org/details%2Farchive.org_bot%2F

1 Like

Their “liveweb” thing does not send the user agent, I think:

4 Likes

I believe some things changed (see the dates). I think we should be checking both of them.

EDIT: Submitted a PR for this:

https://github.com/discourse/discourse/pull/9777

10 Likes

Would be lovely to see this working again. I am promoting Discourse as a central hub for Solid Project, especially for core team members and experts working on standardization of Solid, but this issue is an important reason for them to be unwilling to do so.

1 Like

The pr was merged it should be working

4 Likes

Just confirmed by doing a “save outlinks” on a /top/yearly… working fully right now.

6 Likes