Google indexing same page multiple times: Issue with canonicals

Try putting this into Google:

site:forum.hearingtracker.com/t/costco-kirkland-signature-9-0-product-information/45380

Here’s the results you will see (48 results for a single page):

This is a duplicate content bug, and should be considered an urgent SEO issue for the Discourse platform (I am running v2.4.0.beta3 +4 currently).

I tried to understand why this is happening, and was surprised to find that inspecting the source of the page, the canonical link is updated as I scrolled down the page:

Example:
<link rel="canonical" href="https://forum.hearingtracker.com/t/costco-kirkland-signature-9-0-product-information/45380?page=2" />

So, I guess the rationale here is that long threads are paginated, but since this is a lazy-loading SPA, the canonicals are behaving as if traditional pagination is happening. I’m honestly not sure what the rationale is for doing it this way.

By the way, SEO best practices for pagination are to NOT provide identical meta description and title on page 2, etc. Here’s an example of how I deployed pagination on another part of my website:

Questions:

  • What is the SEO rationale for providing canonicals pointing to paginated points in the thread?
  • If this approach is justified somehow, can we at least ensure that title and meta are not yielding duplicate results in the Google Serps?

This is wrong.

For crawlers, Discourse uses 20 posts longs pages, so every single post can be crawled just fine. There is no SPA for bots.

It has over 800 posts, so that is expected.

Why? What is the actual problem? If you do a real world search for a word, are we linking to a page that doesn’t contain the word or something like that?

2 Likes

Sorry, SPA may be the wrong term, I just meant that a Discourse thread kind of behaves like a single page app in the sense that pagination is happening dynamically…

Yes, I guess it does make sense. I tried searching for some text on page 3, Google brought me to page 3, so that seems good. Wrong spot on the page, but seems probably as close as we can get in this situation.

So in retrospect, I guess having the pagination canonicals does make sense on longer threads, but if you look at the best practices for SEO, the guidance is to not allow Google to index paginated content with identical title and meta description. I guess the solution here is to change the title and meta on successive pages. See:

Source: https://www.searchenginejournal.com/seo-friendly-pagination/275557/

How would that be better for the forum’s human visitors? Do you not think some might become confused if they think they’re going to eg. “page 2” and land on an area of “the only page”? Might they look in vain for pagination navigation that isn’t there?

2 Likes

I’d rather have more confused visitors than less traffic from Google. Duplicate content is a real SEO problem, and the “loves” on your comment by two Discourse team members is seriously perplexing.

I am not sure I’d think it serious enough to call it “a real SEO problem”. AFAIK, what happens without rel="canonical" is the search engines decide which result URL best matches the search instead of what a site might prefer to be the result URL (the canonical).

I think you may have skipped over the image I posted above… Here’s the text:

John Mueller commented, “We don’t treat pagination differently. We treat them as normal pages.”

Meaning paginated pages are not recognized by Google as a series of pages consolidated into one piece of content as they previously advised. Every paginated page is eligible to compete against the root page for ranking.

To encourage Google to return the root page in the SERPs and prevent “Duplicate meta descriptions” or “Duplicate title tags” warnings in Google Search Console, make an easy modification to your code.

If the root page has the formula:

Root page SERP

The successive paginated pages could have the formula:

pagination page SERP

These paginated URL page titles and meta description are purposefully suboptimal to dissuade Google from displaying these results, rather than the root page.

If even with such modifications, paginated pages are ranking in the SERPs, try other traditional on-page SEO tactics such as:

  • De-optimize paginated page H1 tags.
  • Add useful on-page text to the root page, but not paginated pages.
  • Add a category image with an optimized file name and alt tag to the root page, but not paginated pages.

Ah, thanks, I did miss that the concern isn’t about duplicate content but rather duplicate title and meta description warnings.

For Discourse at least, those are more like “notices” than warnings. Kind of like “if you didn’t know about this you should check to make sure it’s OK and if not fix”. You can safely ignore those as a topic discussion shouldn’t meander so much that what would be appropriate for the first posts would not apply to all further posts in the topic.

For example if the “page 1” posts are about “round red widgets” and by “page 2” the posts are about “square green sprockets” members should be urged to stay on topic or the discussion should be split into separate topics.