I want to Update rel=canonical href using Java Script

I have few duplicate pages on my domain, I have to reference canonical tag of the duplicate pages to original page using JavaScript. (Deleting duplicate pages is not an option as they do have considerable traffic)
Could someone suggest how to update a href tag using JavaScript in discourse.

Here ya go @KranthiKiranGude, this is how you could change the href attribute in javascript. First you select the DOM element, then you change the attribute.

<script>
var uC = document.querySelectorAll("link[rel='canonical']")[0];
var newURL = "https://my.coolforum.com/newlink";
uC.setAttribute("href", newURL);
</script>

Of course, you will need some logic based on the page you want operate on.

Generic example logic:

<script>
if("the_actual_page_url_or_id" == "my_interesting_page_url_or_id")
{
   var uC = document.querySelectorAll("link[rel='canonical']")[0];
   var newURL = "https://my.coolforum.com/newlink";
   uC.setAttribute("href", newURL);
}
</script>

Hope this helps.

1 Like

Hi @neounix,

I have tried you code, but instead of updating the href, a new script tag got generated:


I have updated this script in ā€œ/headā€ section.

1 Like

Hi @KranthiKiranGude

Please post the exact code you used and where exactly you added it, including a screen shot of the entry in the </head> section you mentioned.

Thanks!

Seems normal that you will have new Javascript generated when you add more Javascript.

You will need to check the DOM in the web dev console (the elements), not in the page source code, BTW.

1 Like

Hi @neounix,


This is the script I have added. This is just to test it out.

1 Like

I understand.

You are missing an opening quote in your script conditional statement BTW ā€¦

1 Like

Hi @neounix,

It worked in the Dev Console. But, in Page Source it still references to the actual URL.
If I am not wrong, Search Engines will pick from Page Source not the DOM Elements. Please correct me if I am wrong.

1 Like

Iā€™m not sure about that, to be honest. I thought before that modern search engines (GoogleBot) will read the DOM, but now that I think about it, it makes since that search engines might only read the source and not the DOM.

But ā€¦ when I Google to check this, it says:

SEO signals in the DOM (page titles, meta descriptions, canonical tags, meta robots tags, etc.) are respected. Content dynamically inserted in the DOM is also crawlable and indexable. Furthermore, in certain cases, the DOM signals may even take precedence over contradictory statements in HTML source code. This will need more work, but was the case for several of our tests.

Reference:

1 Like

Hi @neounix,

Thanks a lot for your help. Let me also research on this part. But, really thankful to you.

1 Like

Welcome!

Please post back and let us know the results of your research.

Another method, which I have been working on in my spare time lately, is to modify this Discourse Ruby lib file directly:

https://github.com/discourse/discourse/blob/master/lib/canonical_url.rb

You might consider something along that line if you have no joy with the DOM manipulation JS technique, @KranthiKiranGude

1 Like

Hi @neounix,

I tested the page using URL Inspection tool, Google is recognizing the updated URL.

2 Likes

Perfectā€¦ glad to hear it worked.

Thanks for testing and posting back.

PS: That JS DOM method is a lot easier than manipulating canonical_url.rb :slight_smile:

1 Like

Iā€™m not sure if overriding canonical using Javascript will work since this is something that is more on the spider level (i.e. the part that retrieves and collects data) than on the indexer level (the part of a bot that interprets data and stores it in the search index).

Unsolicited advice: you might want to read this topic so you can put those overrides in a plugin:

2 Likes

Yeah, me too. The jury is still out on that one.

But Google searches on this topic yield a lot of fruit, where many people do this, and many say Google respects the DOM changes (and some say they do not, so there seems not be be a strong, overwhelming consensus on the topic), see for example:

I think if I was going to do it, I would (1) delete the original canonical tag from the page source and then (2) insert a new canonical tag in the DOM with JS.

Then, over time we can simply look at the Google search console and see what Google selected as canonical.

See also:

1 Like

Because many people consider this important for SEO, I checked on this again, in light of this confirmation by @KranthiKiranGude

According to developers.google.com, Understand the JavaScript SEO basics:

Googlebot supports web components. When Googlebot renders a page, it flattens the shadow DOM and light DOM content. This means Googlebot can only see content thatā€™s visible in the rendered HTML. To make sure that Googlebot can still see your content after itā€™s rendered, use the Mobile-Friendly Test or the URL Inspection Tool and look at the rendered HTML.

Because (1) @KranthiKiranGude used his URL Inspection Tool in his testing and (2) he confirmed the canonical was changed as expected in this way, then it follows that per Google, GoogleBot does indeed ā€œseeā€ and registers this DOM content change after the page is rendered.

Reference:

https://developers.google.com/search/docs/guides/javascript-seo-basics#web-components

1 Like

Yeah, I totally support the idea of Google flattening the DOM contents like that while indexing.

But some/most meta tags have their semantics at the HTTP protocol level rather than at the HTML protocol level, despite the fact that theyā€™re being present in the HTML. I emphasized the ā€˜while indexingā€™ because I am not sure it flattens the DOM like that and takes the updated canonical URL into account while crawling.

(To put it differently, Iā€™m not sure if DOM contents also means ā€˜metadata embedded in the contentā€™. Yes it sees it that way but Iā€™m not sure if it will use it that way).

Maybe this article explains it better: How Google Crawls Your Website and Indexes Your Content

When Google needs to crawls JavaScript sites, an addition stage is required that traditional HTML content doesnā€™t need. It is know as the rendering stage, which something takes additional time. The indexing stage and rendering stage are separate phases, which lets Google index the non-JavaScript content first

.

1 Like

Not really, sorry. That article by www.hillwebcreations.com does not even mention the DOM, how to inspect the DOM, etc. and at least to me, it reads, ā€œdated and opinionatedā€ (not really current, nor factual).

Personally, I prefer these two well written references, both with more authority, factual and well referenced, in my mind:

https://developers.google.com/search/docs/guides/javascript-seo-basics#web-components

and the first one where they actually tested (and that was long before GoogleBot switched to a Chromium core which could read the DOM (Javascript) even better):

We Tested How Googlebot Crawls Javascript And Hereā€™s What We Learned

After my research, I tend to agree with Google developers that they will index (and get their SEO signals from) what is found by using the URL Inspection Tool and it is from this, we can ā€œjudgeā€ SEO signals and content, The discussion by Google is clear, factual, authoritative, and non-speculative.

Because@KranthiKiranGude has confirmed his canonical link was updated using the URL Inspection Tool which Google, as the authority said, is ā€œall you needā€ to see how Google views a page from an SEO perspective.

Technical Summary

Because Google uses SEO signals from what can be seen from their URL Inspection Tool; and the fact that Google Developers have clearly stated that their SEO signals can be directly analyzed by the URL Inspection Tool; and the fact the JS changes @KranthiKiranGude made to the DOM are visible in the URL Inspection Tool, thatā€™s ā€œmore than good enoughā€, in my view.

HTH

1 Like

Yes, that article indeed clearly states that they have seen canonical tags that were dynamically inserted behave exactly the same as if they were in source code. You are right (and I should have read this more thoroughly the first time you posted it).

Although three of the four pages you referred to in this topic, including the one that gave us the answer, are even older than that article I posted :wink:

OBTW @RGJ, sorry for the confusion about ā€œnot currentā€ā€¦

When I use the term ā€œdatedā€ or ā€œnot currentā€ I am talking about concepts and ideas, not the physical date of any article.

Some people write articles with dates from ā€œtodayā€ and the concepts are ā€œdatedā€ (and wrong) and some people have written articles from 10 years ago, which are still highly relevant today.

That is what I mean by ā€œdatedā€ or ā€œnot currentā€, it is based on ā€œconceptsā€ not physical dates written on paper or a web article. Sorry for any confusion in my reply using the terms in this manner.

What is important, at least in my mind, is that we provided a solution for @KranthiKiranGude and he confirmed it works and based on your skeptical post, we both did some additional research for this issue.

We verified (1) that this method, changing the canonical link using Javascript, is valid; and that (2) Google developers have confirmed it; and (3) we have a way to confirm it as users, using the URL Inspection Tool (as @KranthiKiranGude used and shared with us).

All the best and thanks so much for the ā€œback-and-forthā€ on this interesting topic and for helping make the solution even more valid and stronger.

Iā€™m off to other tasks (still struggling to learn Ruby on Rails after over a decade of PHP coding) ; as this topic is fully ā€œmission accomplishedā€ :slight_smile:

Until next timeā€¦ all the best!

1 Like