Send canonical link header instead of noindex header

Send a canonical link header instead of a noindex header.

Sending a canonical header likely has the same advantage on crawling budget as sending a noindex header - without the SEO implications of excluding urls which might have backlinks via noindex.


See also Consolidate Duplicate URLs with Canonicals | Google Search Central  |  Documentation  |  Google Developers

If you can configure your server, you can use a rel=“canonical” HTTP header (rather than an HTML tag) to indicate the canonical URL for a document supported by Search, including non-HTML documents such as PDF files.

  • :+1: We can configure our server.
  • Does use a rel="canonical" HTTP header rather than an HTML tag emphasise a preference for the HTTP header solution?

From #11553

Googlebot handles no-index headers very elegantly. It advises to leave as many routes as possible open and uses headers for high fidelity rules regarding indexes.

Maybe Google handles canonical link headers equaly elegantly to no-index headers.

1 Like

I am struggling with this, reading the recommendation from Google it seems it does not particularly care.

The recommendations for the rel="canonical" HTTP header are the same as rel="canonical" link tag.

I guess not much to lose and it is possible that a mix of no index plus rel canonical is the right Google recipe. But I am just not sure.

@Falco ?

This is rolling back the recently introduced site setting into what’s basically a noop (moving what we send as a head tag in a header, no semantic change).

I do not want this change as is.

1 Like

For the new default SiteSetting.allow_indexing_non_canonical_urls = false this is the way it’s implemented right now - and it stays this way:

  • header noindex
  • html link-tag canonical (might be ignored)

Without patch and SiteSetting.allow_indexing_non_canonical_urls = true

  • – no header –
  • html link-tag canonical

With patch and SiteSetting.allow_indexing_non_canonical_urls = true

  • header: Link: <https://forum.example.com/t/test-example/1234>; rel="canonical"
  • html link-tag canonical (might be ignored - but anyway same as header)

The whole idea behind this:
Set canonical as http-header to get the same benefit like the noindex http-header - namely faster crawling.
Thereby this might make noindex obsolet with its uncertain implications.

Another point on noindex vs. canonical:

  • noindex is more than a very strong signal to not put the page into search index.
    But with noindex the page content is still processed by Google Bot to extract links (there is the extra option nofollow to disable this).
  • canonical is a strong signal that the content to be crawled is on some other canonical url.
    In case Google Bot decides to accept this signal for one page, there is a big chance it does not process the page content at all – and only processes the canonical url.

This is a ‘thought experiment’. It’s nowhere implemented - and I never recommend to implement it:

  • header noindex
  • html meta-tag noindex (instead of: html link-tag canonical)

– OR –

  • – no header –
  • html meta-tag noindex

Why or why not to implement it like this?

This change is not a ‘noop’:
Google might handle headers and html content in different stages of its processing-queues. By sending headers we might skip further processing-queues (e.g. Render Queue) and thereby free up crawl budget for more important pages.

See Advanced Guide to How Google Search Works | Google Search Central  |  Documentation  |  Google Developers

(The only graph of the processing queue I’ve found: Understand JavaScript SEO Basics | Google Search Central  |  Documentation  |  Google Developers)

The noindex change has been rolled back recently:

May you have a new look at this PR:

Not strongly against this but it feels so minor. Google is always downloading content these days, I doubt saving an HTML parse is really going to make any material difference.

Lots of other areas need focus first, the microdata is probably the first place that needs TLC.