One of our clients found the following SEO related issue, which I can reproduce here on meta.
When Discourse is visited by a crawler Discourse inserts pagination links ?page=XX
like this:
<link rel="next" href="/t/slug/123?page=2">
and
<span itemprop='name'><b><a rel="next" itemprop="url" href="/t/slug/123?page=2">next page â</a></b></span>
When such a page is loaded in a browser, Discourse issues a Javascript redirect to a /t/slug/123/NN
URL where NN is a post number.
So https://meta.discourse.org/t/post-rate-limit-trigger-for-a-topic-thats-heating-up/98294?page=2
redirects to https://meta.discourse.org/t/post-rate-limit-trigger-for-a-topic-thats-heating-up/98294/23
and https://meta.discourse.org/t/post-rate-limit-trigger-for-a-topic-thats-heating-up/98294?page=3
redirects to https://meta.discourse.org/t/post-rate-limit-trigger-for-a-topic-thats-heating-up/98294/45
However, sometimes that page contains a canonical URL that does not correspond with the original page that was being requested.
In the above example, ?page=3
redirects to post 45 and the page for post 45 contains a canonical URL ?page=2
<link rel="canonical" href="https://meta.discourse.org/t/post-rate-limit-trigger-for-a-topic-thats-heating-up/98294?page=2" />
Quoting from the clients SEO report:
This creates a chain of canonical tags and redirects and Google will begin to not trust the canonical tags put in place.
I can’t seem to figure out where this comes from. Sometimes the canonical URL has a higher page number, sometimes it refers to a lower page number (like in this example), and sometimes it’s correct.
I thought it might have to do with deleted posts or whispers but a topic with lots of removed posts does not show this behavior per se (like https://meta.discourse.org/t/topic-list-previews/101646?page=5
goes to post 471 (!) which has the correct canonical URL ?page=5
. I think this might be an off-by-one error.