Send canonical link header instead of noindex header

rrit · March 6, 2022, 5:07pm

Send a canonical link header instead of a noindex header.

Sending a canonical header likely has the same advantage on crawling budget as sending a noindex header - without the SEO implications of excluding urls which might have backlinks via noindex.

See also How to Specify a Canonical with rel="canonical" and Other Methods | Google Search Central | Documentation | Google for Developers

If you can configure your server, you can use a rel=“canonical” HTTP header (rather than an HTML tag) to indicate the canonical URL for a document supported by Search, including non-HTML documents such as PDF files.

We can configure our server.
Does use a rel="canonical" HTTP header rather than an HTML tag emphasise a preference for the HTTP header solution?

From #11553

Googlebot handles no-index headers very elegantly. It advises to leave as many routes as possible open and uses headers for high fidelity rules regarding indexes.

Maybe Google handles canonical link headers equaly elegantly to no-index headers.

sam · March 8, 2022, 3:36am

I am struggling with this, reading the recommendation from Google it seems it does not particularly care.

The recommendations for the rel="canonical" HTTP header are the same as rel="canonical" link tag.

I guess not much to lose and it is possible that a mix of no index plus rel canonical is the right Google recipe. But I am just not sure.

@Falco ?

Falco · March 8, 2022, 3:48am

This is rolling back the recently introduced site setting into what’s basically a noop (moving what we send as a head tag in a header, no semantic change).

I do not want this change as is.

rrit · March 8, 2022, 8:40am

For the new default SiteSetting.allow_indexing_non_canonical_urls = false this is the way it’s implemented right now - and it stays this way:

header noindex
html link-tag canonical (might be ignored)

Without patch and SiteSetting.allow_indexing_non_canonical_urls = true

– no header –
html link-tag canonical

With patch and SiteSetting.allow_indexing_non_canonical_urls = true

header: Link: <https://forum.example.com/t/test-example/1234>; rel="canonical"
html link-tag canonical (might be ignored - but anyway same as header)

The whole idea behind this:
Set canonical as http-header to get the same benefit like the noindex http-header - namely faster crawling.
Thereby this might make noindex obsolet with its uncertain implications.

Another point on noindex vs. canonical:

noindex is more than a very strong signal to not put the page into search index.
But with noindex the page content is still processed by Google Bot to extract links (there is the extra option nofollow to disable this).
canonical is a strong signal that the content to be crawled is on some other canonical url.
In case Google Bot decides to accept this signal for one page, there is a big chance it does not process the page content at all – and only processes the canonical url.

This is a ‘thought experiment’. It’s nowhere implemented - and I never recommend to implement it:

header noindex
html meta-tag noindex (instead of: html link-tag canonical)

– OR –

– no header –
html meta-tag noindex

Why or why not to implement it like this?

rrit · March 8, 2022, 9:26am

This change is not a ‘noop’:
Google might handle headers and html content in different stages of its processing-queues. By sending headers we might skip further processing-queues (e.g. Render Queue) and thereby free up crawl budget for more important pages.

See In-Depth Guide to How Google Search Works | Google Search Central | Documentation | Google for Developers

(The only graph of the processing queue I’ve found: Understand JavaScript SEO Basics | Google Search Central | Documentation | Google for Developers)

rrit · March 20, 2022, 2:10pm

The noindex change has been rolled back recently:

Search engines now blocked from indexing non-canonical pages - #30 by sam
FEATURE: enable canonical url indexing by SamSaffron · Pull Request #16196 · discourse/discourse · GitHub

May you have a new look at this PR:

sam · March 21, 2022, 12:14am

Not strongly against this but it feels so minor. Google is always downloading content these days, I doubt saving an HTML parse is really going to make any material difference.

Lots of other areas need focus first, the microdata is probably the first place that needs TLC.

Topic		Replies	Views
Search engines now blocked from indexing non-canonical pages Announcements seo	23	4132	March 15, 2022
3 Pages Indexed in Google with same canonical tag Feature	1	524	November 8, 2020
Homepage doesn't have canonical URL Feature	9	1920	January 4, 2020
Canonical Meta Data Does Not Change Correctly in Discourse App when not loaded by a webcrawler Feature	5	909	August 1, 2020
I want to Update rel=canonical href using Java Script Support	18	4171	August 2, 2020

Send canonical link header instead of noindex header

Related topics