The motivation is described here:
There should only be one url address for canonical tag. When I scroll down, the canonical tags show the url like ‘https://xxx.com/t/some-topic?page=3’ like that. Instead, it should be ‘https://xxx.com/t/some-topic’.
Hope this can be fixed.
I think you are looking at it from a human visitor perspective and not as how a search bot sees a Discourse forum.
Search bots see topics as paged.That is, posts 1 - 20 are in some-topic?page=1, posts 21 - 40 in some-topic?page=2 etc.
It would not be ideal to have it be not as it is now.
Hmmm, reading the following, I think the ?page=X should not be included in the canonical tag
As you want the topic page itself to get the link juice, not the specific page where the content is located. However, I’m not an SEO expert by any means, and am purely stating this based on my reading of the above article.
There should only be one url for a single post of the canonical tag.
I prefer to go by what Google says
It looks like having “next” and “prev” might help, even though “doing nothing” and letting Google decide should.work too.
That’s how it is now
You guys commenting on this be SURE you are viewing the site like Google does, with a Google webcrawler user agent. It is not this view you are seeing in your browser right now as a normal user.
Chrome dev tools -> Network -> more tools -> Network conditions
<link rel="canonical" href="https://meta.discourse.org/t/import-posts-from-facebook-group-into-discourse/6089?page=2" />
Here, this page
<link rel="canonical" href="https://meta.discourse.org/c/howto/devs" />
So it looks like Discourse is doing the right thing, but for whatever reason, Google isn’t using the canonical.
Maybe adding a “noindex” to all non-canonical pages would help steer Google in the right direction?
Wouldn’t that mean that Google is no longer seeing the content on these pages?
Only at that URL path. Google would still see the content but it would be indexed only at the canonical path. i.e. no “same content at more than one URL” problems.
But in the crawler view, the canonical path isn’t infinite scrolling, but shows only page 1, right?
(Of course, Google will still find all posts, we’re just talking about the tag pages.)
There is some confusion in this topic. At times the focus has been more about Posts in Topics, other times Topics in List pages.
Admittedly, I have a biased view. I feel that in most cases “SEO takes care of itself” and there is no need to worry about tweaking things. That is, as long as the content is there, it will eventually get found. (and for Discourse, usually sooner rather than later).
I have not spent a great deal of time checking on how Googlebot sees Discourse. I prefer to work on development (in particular, plugins at the moment) But don’t let that stop you from using your browser’s “as googlebot” to research if that is where your interest lies.
Back to the original issue, I wonder if Google is getting that URL based on its existence on the Category Crawler view, even though there is no second page of results.
Repro on Try
- Visit https://try.discourse.org
- Open Dev Tools, go to Network > Network Conditions, enable User Agent as Googlebot
- Refresh the page
- Click on any category
- You will see a next page link, clicking on it results in a blank page (every category on Try is doing this)
Link of next page: discourse - Demo
It seems to me, the next page link should only be visible when it applies.
Also note, this happens on categories that only have 1 topic, so this isn’t an off-by-one error.
Canonical tag on topic URL
By the way,
?page=X on canonicals is also causing duplicate content warnings from Google.
At least on my current installation (v1.8.0.beta9 +132) Discourse is adding
?page=X to canonicals, which in my humble opinion should be the case.
In my case, Google is showing me 105 topics with duplicate content on Webmaster Tools which is probably hurting SEO. Example:
<link rel="canonical" href="https://comunidad.hipertextual.com/t/a-que-videouegos-estais-jugando/79?page=3"/>