Google's Cache fails to render topics


(Kane York) #1

After reading this post and going back in time to this one, someone linked a Google search query and, of course, I had to try it.
Well, the topic came up in the search (of course), so I tried clicking on the cache link: http://webcache.googleusercontent.com/search?q=cache:KQQcCO_syekJ:meta.discourse.org/t/will-posts-show-up-on-google/1184+&cd=1&hl=en&ct=clnk&gl=za

This results in an empty page with Google’s standard ‘cached page’ header, along with several errors in the JS console:

The “Text-only version” link correctly gives the <noscript> version: http://webcache.googleusercontent.com/search?q=cache:KQQcCO_syekJ:meta.discourse.org/t/will-posts-show-up-on-google/1184&hl=en&gl=za&strip=1


(Jeff Atwood) #2

Isn’t this because the CDN URLs changed for the cached resources since then? Can’t check easily on iPad.


(Michael - DiscourseHosting.com) #3

All the assets return 404, they indeed go through a CDN but if there were no CDN this would have happened as well after updating the software because the asset fingerprints will change. Seems like a generic asset pipeline issue to me, not an issue specific to Discourse.


(Michael - DiscourseHosting.com) #4

I was thinking, could this (and other problems where javascript is failing with an empty page as a result) be solved by implementing a top level exception handler that moves everything within <noscript> outside of it, so you at least get a list of links instead of an empty page?


(Jeff Atwood) #5

Anything that helps the “empty page, nothing to diagnose” support problem we see a lot would be beneficial.


(Joel Limberg) #6

Possibly unrelated, but discovered this when I did a quick view source:

The current topic has this in the <head>: <link href="http://866185888.r.cdn77.net/t/googles-cache-renders-empty-topics/11457" rel="canonical" />

This url should probably be CDN-less.


Canonical is using CDN incorrectly
(Jeff Atwood) #7

Thanks that is indeed unrelated but definitely a bug and now fixed.


(Jeff Atwood) #8

Google Cache should absolutely render topics now, we are stripping out almost everything except for content when presenting topics to spiders.


(Jeff Atwood) #9