Canonical Meta Data Does Not Change Correctly in Discourse App when not loaded by a webcrawler

neounix · July 31, 2020, 5:00pm

There is a bug in the Discourse app related to updating the <link rel="canonical"> meta data element in the <head> section of the Discourse DOM.

Basically, when a browser client enters the application, and the application is first loaded, the <link rel="canonical" href=""> element will set according to that initial page load; but then when a user clicks in the app (normal user behavior), without reloading the page manually, the <link rel="canonical"> link will not update.

I have tested this bug and reproduced it on the meta site:

Fig 1. Enter meta from the home page, the canonical link is correct, as it the title element.

Fig 2. Visit a topic. The title element changes correctly, but the canonical link is not correct (does not update as it should).

Fig 3. Visit another topic. The title element changes correctly, but the canonical link is not correct (does not update as it should).

Implications for SEO

This bug could adversely effect SEO because when Google indexes the page, if Googlebot is not “hard reloading” every page, the canonical information will be incorrect for each page (as in the image sequences above).

Reproducibility

I have reproduced this bug consistently on both the meta site and our site.

Notes

I have seen these kind of node.js (SPA) lifecycle issues before with other web frameworks (not only Ember) where DOM elements are not updated, based on (Ember and other SPA framework) lifecycle hooks within the web application framework.

Falco · July 31, 2020, 5:04pm

This won’t ever happen as we don’t serve the SPA for Googlebot. You can set your User Agent to GoogleBot UA too see how it works.

neounix · July 31, 2020, 5:16pm

Hi @Falco

Thanks for your reply.

Yes, it’s not a problem when the UA is set to GoogleBot (just confirmed).

I agree that may not be a problem with SEO since you do not serve the SPA to GoogleBot, but it is none-the-less a bug in the SPA.

Also, I need to think about the implications of not serving the SPA to GoogleBot.

Thanks for letting me know that “small fact”…

(Note: I had no idea that “Suggested Topics” were not served to GoogleBot, but now I know … thanks for informing).

Falco · July 31, 2020, 5:51pm

We serve a completely different document for crawlers as not every crawler can run javascript and we want discourse to be accessible for those clients too, even if they receive reduced functionality they can consume all content.

neounix · July 31, 2020, 6:02pm

Thank you so much for letting me know.

Now, I realize that some earlier discussions about SPA, “infinite scroll” and other SEO related issues were completely wrong, since the SPA is not served to GoogleBot.

This changes my approach to some custom code I wrote recently; and now I know to check using the GoogleBot UA in the console.

Thanks so much for that, @Falco ! Much appreciated.

Question:

What is the best way to add a single custom javascript file to the HTML which is rendered to GoogleBot?

Is there a “standard way” to modify the HTML served to bots?

The reason I ask is that we have some custom code which was created in a plugin I wrote (meant for bots); but I checked using the GoogleBot UA in the console (thanks again for telling me that I need to do that), and none of that custom plugin code is consumed by GoogleBot.

neounix · August 1, 2020, 5:20am

In the interim, since I cannot accomplish what I want in a (handlebars-based) plugin for HTML served to crawlers, we have decided to simply strip out the canonical tags from Discourse, which is a partial solution for now until I can figure out how to modify the canonical tag with some Javascript for web crawlers.

Discourse provides a nice mechanism for these kinds of changes in the container yml files, so that is what I have done today.

I am very grateful to Discourse meta for pointing out that the Discourse app served to (identified) crawlers is not the same as the pages served to users.

Please note that I am not recommending this “interim solution” to other Discourse sys admins. I am simply sharing what I have decided to do, at this time, and how I did it (until we come up with a more interesting solution).

Topic		Replies	Views
?page= sometimes redirects to a page with a different canonical URL Bug	3	706	October 19, 2020
Googlebot is getting non-javascript version of the site Dev	16	1500	March 9, 2024
Disable or bypass feature detect for Googlebot (while serving JS app to crawlers) Support unsupported-install	8	3196	June 14, 2022
Google indexing same page multiple times: Issue with canonicals Support	8	1516	June 28, 2023
I want to Update rel=canonical href using Java Script Support	18	4166	August 2, 2020

Canonical Meta Data Does Not Change Correctly in Discourse App when not loaded by a webcrawler

Related topics