I want to Update rel=canonical href using Java Script

KranthiKiranGude · August 1, 2020, 4:00pm

I have few duplicate pages on my domain, I have to reference canonical tag of the duplicate pages to original page using JavaScript. (Deleting duplicate pages is not an option as they do have considerable traffic)
Could someone suggest how to update a href tag using JavaScript in discourse.

neounix · August 2, 2020, 4:58am

Here ya go @KranthiKiranGude, this is how you could change the href attribute in javascript. First you select the DOM element, then you change the attribute.

<script>
var uC = document.querySelectorAll("link[rel='canonical']")[0];
var newURL = "https://my.coolforum.com/newlink";
uC.setAttribute("href", newURL);
</script>

Of course, you will need some logic based on the page you want operate on.

Generic example logic:

<script>
if("the_actual_page_url_or_id" == "my_interesting_page_url_or_id")
{
   var uC = document.querySelectorAll("link[rel='canonical']")[0];
   var newURL = "https://my.coolforum.com/newlink";
   uC.setAttribute("href", newURL);
}
</script>

Hope this helps.

KranthiKiranGude · August 2, 2020, 6:11am

Hi @neounix,

I have tried you code, but instead of updating the href, a new script tag got generated:

I have updated this script in “/head” section.

neounix · August 2, 2020, 6:37am

Hi @KranthiKiranGude

Please post the exact code you used and where exactly you added it, including a screen shot of the entry in the </head> section you mentioned.

Thanks!

Seems normal that you will have new Javascript generated when you add more Javascript.

You will need to check the DOM in the web dev console (the elements), not in the page source code, BTW.

KranthiKiranGude · August 2, 2020, 6:47am

Hi @neounix,

This is the script I have added. This is just to test it out.

neounix · August 2, 2020, 6:51am

I understand.

You are missing an opening quote in your script conditional statement BTW …

KranthiKiranGude · August 2, 2020, 7:00am

Hi @neounix,

It worked in the Dev Console. But, in Page Source it still references to the actual URL.
If I am not wrong, Search Engines will pick from Page Source not the DOM Elements. Please correct me if I am wrong.

neounix · August 2, 2020, 7:04am

I’m not sure about that, to be honest. I thought before that modern search engines (GoogleBot) will read the DOM, but now that I think about it, it makes since that search engines might only read the source and not the DOM.

But … when I Google to check this, it says:

SEO signals in the DOM (page titles, meta descriptions, canonical tags, meta robots tags, etc.) are respected. Content dynamically inserted in the DOM is also crawlable and indexable. Furthermore, in certain cases, the DOM signals may even take precedence over contradictory statements in HTML source code. This will need more work, but was the case for several of our tests.

Reference:

KranthiKiranGude · August 2, 2020, 7:07am

Hi @neounix,

Thanks a lot for your help. Let me also research on this part. But, really thankful to you.

neounix · August 2, 2020, 7:12am

Welcome!

Please post back and let us know the results of your research.

Another method, which I have been working on in my spare time lately, is to modify this Discourse Ruby lib file directly:

https://github.com/discourse/discourse/blob/master/lib/canonical_url.rb

You might consider something along that line if you have no joy with the DOM manipulation JS technique, @KranthiKiranGude

KranthiKiranGude · August 2, 2020, 7:23am

Hi @neounix,

I tested the page using URL Inspection tool, Google is recognizing the updated URL.

neounix · August 2, 2020, 7:26am

Perfect… glad to hear it worked.

Thanks for testing and posting back.

PS: That JS DOM method is a lot easier than manipulating canonical_url.rb

RGJ · August 2, 2020, 7:47am

I’m not sure if overriding canonical using Javascript will work since this is something that is more on the spider level (i.e. the part that retrieves and collects data) than on the indexer level (the part of a bot that interprets data and stores it in the search index).

Unsolicited advice: you might want to read this topic so you can put those overrides in a plugin:

neounix · August 2, 2020, 7:57am

Yeah, me too. The jury is still out on that one.

But Google searches on this topic yield a lot of fruit, where many people do this, and many say Google respects the DOM changes (and some say they do not, so there seems not be be a strong, overwhelming consensus on the topic), see for example:

I think if I was going to do it, I would (1) delete the original canonical tag from the page source and then (2) insert a new canonical tag in the DOM with JS.

Then, over time we can simply look at the Google search console and see what Google selected as canonical.

See also:

neounix · August 2, 2020, 8:42am

Because many people consider this important for SEO, I checked on this again, in light of this confirmation by @KranthiKiranGude

According to developers.google.com, Understand the JavaScript SEO basics:

Googlebot supports web components. When Googlebot renders a page, it flattens the shadow DOM and light DOM content. This means Googlebot can only see content that’s visible in the rendered HTML. To make sure that Googlebot can still see your content after it’s rendered, use the Mobile-Friendly Test or the URL Inspection Tool and look at the rendered HTML.

Because (1) @KranthiKiranGude used his URL Inspection Tool in his testing and (2) he confirmed the canonical was changed as expected in this way, then it follows that per Google, GoogleBot does indeed “see” and registers this DOM content change after the page is rendered.

Reference:

https://developers.google.com/search/docs/guides/javascript-seo-basics#web-components

RGJ · August 2, 2020, 9:02am

Yeah, I totally support the idea of Google flattening the DOM contents like that while indexing.

But some/most meta tags have their semantics at the HTTP protocol level rather than at the HTML protocol level, despite the fact that they’re being present in the HTML. I emphasized the ‘while indexing’ because I am not sure it flattens the DOM like that and takes the updated canonical URL into account while crawling.

(To put it differently, I’m not sure if DOM contents also means ‘metadata embedded in the content’. Yes it sees it that way but I’m not sure if it will use it that way).

Maybe this article explains it better: How Google Crawls Your Website and Indexes Your Content

When Google needs to crawls JavaScript sites, an addition stage is required that traditional HTML content doesn’t need. It is know as the rendering stage, which something takes additional time. The indexing stage and rendering stage are separate phases, which lets Google index the non-JavaScript content first

.

neounix · August 2, 2020, 9:20am

Not really, sorry. That article by www.hillwebcreations.com does not even mention the DOM, how to inspect the DOM, etc. and at least to me, it reads, “dated and opinionated” (not really current, nor factual).

Personally, I prefer these two well written references, both with more authority, factual and well referenced, in my mind:

https://developers.google.com/search/docs/guides/javascript-seo-basics#web-components

and the first one where they actually tested (and that was long before GoogleBot switched to a Chromium core which could read the DOM (Javascript) even better):

We Tested How Googlebot Crawls Javascript And Here’s What We Learned

After my research, I tend to agree with Google developers that they will index (and get their SEO signals from) what is found by using the URL Inspection Tool and it is from this, we can “judge” SEO signals and content, The discussion by Google is clear, factual, authoritative, and non-speculative.

Because@KranthiKiranGude has confirmed his canonical link was updated using the URL Inspection Tool which Google, as the authority said, is “all you need” to see how Google views a page from an SEO perspective.

Technical Summary

Because Google uses SEO signals from what can be seen from their URL Inspection Tool; and the fact that Google Developers have clearly stated that their SEO signals can be directly analyzed by the URL Inspection Tool; and the fact the JS changes @KranthiKiranGude made to the DOM are visible in the URL Inspection Tool, that’s “more than good enough”, in my view.

HTH

RGJ · August 2, 2020, 10:50am

Yes, that article indeed clearly states that they have seen canonical tags that were dynamically inserted behave exactly the same as if they were in source code. You are right (and I should have read this more thoroughly the first time you posted it).

Although three of the four pages you referred to in this topic, including the one that gave us the answer, are even older than that article I posted

neounix · August 2, 2020, 11:07am

OBTW @RGJ, sorry for the confusion about “not current”…

When I use the term “dated” or “not current” I am talking about concepts and ideas, not the physical date of any article.

Some people write articles with dates from “today” and the concepts are “dated” (and wrong) and some people have written articles from 10 years ago, which are still highly relevant today.

That is what I mean by “dated” or “not current”, it is based on “concepts” not physical dates written on paper or a web article. Sorry for any confusion in my reply using the terms in this manner.

What is important, at least in my mind, is that we provided a solution for @KranthiKiranGude and he confirmed it works and based on your skeptical post, we both did some additional research for this issue.

We verified (1) that this method, changing the canonical link using Javascript, is valid; and that (2) Google developers have confirmed it; and (3) we have a way to confirm it as users, using the URL Inspection Tool (as @KranthiKiranGude used and shared with us).

All the best and thanks so much for the “back-and-forth” on this interesting topic and for helping make the solution even more valid and stronger.

I’m off to other tasks (still struggling to learn Ruby on Rails after over a decade of PHP coding) ; as this topic is fully “mission accomplished”

Until next time… all the best!

Topic		Replies	Views
Search engines now blocked from indexing non-canonical pages announcements seo	23	3567	March 15, 2022
Removing the /2, /3, /4, etc links for each reply within a topic URL dev seo	28	3594	January 20, 2024
SEO Problems with RSS duplicate content support rss-polling , seo	5	231	March 16, 2024
Disable or bypass feature detect for Googlebot (while serving JS app to crawlers) support unsupported-install	8	2427	June 14, 2022
Adding Canonical Redirects for SEO Optimization support	24	6949	October 1, 2015

I want to Update rel=canonical href using Java Script

We Tested How Googlebot Crawls Javascript And Here’s What We Learned

Related Topics