One of our clients found the following SEO related issue, which I can reproduce here on meta.
When Discourse is visited by a crawler Discourse inserts pagination links
?page=XX like this:
<link rel="next" href="/t/slug/123?page=2">
<span itemprop='name'><b><a rel="next" itemprop="url" href="/t/slug/123?page=2">next page â</a></b></span>
/t/slug/123/NN URL where NN is a post number.
https://meta.discourse.org/t/post-rate-limit-trigger-for-a-topic-thats-heating-up/98294?page=2 redirects to
https://meta.discourse.org/t/post-rate-limit-trigger-for-a-topic-thats-heating-up/98294?page=3 redirects to
However, sometimes that page contains a canonical URL that does not correspond with the original page that was being requested.
In the above example,
?page=3 redirects to post 45 and the page for post 45 contains a canonical URL
<link rel="canonical" href="https://meta.discourse.org/t/post-rate-limit-trigger-for-a-topic-thats-heating-up/98294?page=2" />
Quoting from the clients SEO report:
This creates a chain of canonical tags and redirects and Google will begin to not trust the canonical tags put in place.
I can’t seem to figure out where this comes from. Sometimes the canonical URL has a higher page number, sometimes it refers to a lower page number (like in this example), and sometimes it’s correct.
I thought it might have to do with deleted posts or whispers but a topic with lots of removed posts does not show this behavior per se (like
https://meta.discourse.org/t/topic-list-previews/101646?page=5 goes to post 471 (!) which has the correct canonical URL
?page=5. I think this might be an off-by-one error.