What is the Discourse approach to SEO?

search

(John Mckay) #1

Hi guys

I really love Discourse, I am working on an SPA too, and have been following your development closely. When you make an architectural decision, we tend to roll with that as well.

You use RoR and EmberJS, we use ASP.Net MVC and AngularJS. Different technologies, but similar conceptually on the client and server.

We have noticed that you don’t seem to serve web crawlers (googlebot, bingbot, etc) your Javascript. You never followed Google’s hash fragment recommendations, which is good news because that was recently deprecated.

So we have been doing the same, we opted not to follow the hash fragment recommendations either, and when a web crawler arrives at our website, we give them a server generated version of that page. If a regular human arrives, they get the full SPA experience.

One of the things I have noticed, in particular, is that when I check Google cache from before we stopped giving Google our JS, it’s a complete mess. The Google cache page tries to hit up our api (CORS says No!) and download views (again, CORS).

Since we stopped giving them the JS, and instead opted for the server-side rendered version, things are much nicer when viewing the Google cache version of the page.

However, Google seem pretty, consistently, adamant that they can handle Javascript.I have been talking to people on other forums who suggest that we should just let Google let it rip with our SPA,and that the cache view isn’t important.

Any feedback would be really welcome. Are you guys planning at some point to stop serving up server side rendered versions of any particular page to web crawlers, or are you going to continue to do so?

Do you honestly believe that Google can handle JS as well as they say they can?

Apologies if this is off topic, but hearing your opinions/views on this would be really great.

Thanks!


(Jeff Atwood) #2

If you visit with your user agent set to Google you will see what we produce. We have not had any problems with that approach so far. (And we do support the hash thing because of other search engines like Yandex which are far less sophisticated than Google)


(John Mckay) #3

Thanks for taking the time to answer, and that’s good to know.

I was under the impression that having a # in the URL was the indication to any crawler that you follow that particular protocol, that was how you let a crawler know.

Is that not the case? Also do you think it is worth putting in the work now to implement that functionality now that Google have deprecated it?


(Jeff Atwood) #4

Not unless every search engine on the planet has deprecated it too. We do try to detect crawlers and send them the 1996 html 1.0 version of the site.


(John Mckay) #5

Ok, I thought it was mainly a Google thing.

Thanks very much :slight_smile:


(John Mckay) #6

Sorry, just to clarify, when googlebot visits Discourse, they get the 1996 HTML 1.0 version of that pages content?

I know that is what I get when setting my user agent to googlebot, but maybe you use other metrics to detect it and deliver them something different?


(Kane York) #7

Nope, just dumb user agent detection. If Google goes and does something like “let’s try loading the page with a clean Chrome browser to see if they’re pulling anything tricky” they’ll see the same content, just after the JS all loads.


(Spooky) #8

for some reason the text “Not unless every search engine on the planet has deprecated it too” wasn’t index by google, worries me a bit.


(Jeff Atwood) #9

I think you are doing something wrong.