Canonical tag generated with "?page=2"?


(Charles Walter) #21

Based on what I raed in the thread, I thought the canonical URL for a response would be grouped with a page number, but instead it appears that no canonical URL was found. (or perhaps this will be taken care of in the next release?)

http://www.helloforos.com/t/visa-u-victimas-de-crimen/131/281

(Benjamin Blackmer) #22

@charleswalter, I’m not seeing the canonical tag on your site. That’s odd. Not sure what’s up there.

However, it is still odd to me that the canonical tag includes a page number in the URL. That seems to go against what canonical is used for.


(Mittineague) #23

I understand it the way Google sees Discourse “pages”

For example these two links to different posts
…/whatever/1234/5
…/whatever/1234/12
are both part of “page 1” and should not be indexed as separate pages.


(Benjamin Blackmer) #24

I’m slowly understanding that this is how Discourse is set up to show
topics to Google.

However, since the topics are displayed as one infinite page, it’s odd to
me that each 20 posts are separated into pages. It’s effectively breaking
up long topics into multiple separate topics, which dilutes the weight of
the topic itself, and this is different than how the topic is displayed to
users.


(Mittineague) #25

Kind of. Pages are still displayed 20 posts at a time to users with JavaScript enabled. But more content is loaded upon scroll so that a topic looks to be one long topic.

TBH I don’t know if giving everything to Google is possible the way that Discourse works.

AFAIK forum applications have all used pagination of long threads.
I don’t know it it caused “dilution”,

So basically what is being desired here is to somehow extend the benefit of Discourse’s infinite scroll feature that users enjoy so that Discourse will have a supposed SEO advantage over other forum applications?


(Benjamin Blackmer) #26

That could be an outcome, but not why I was writing.

The pagination itself is fine, but I don’t know why it is necessary in the canonical tag. Splitting it up like definitely dilutes the overall topic, because instead of one large topic, Google sees it as multiple, 20-post topics.


(Charles Walter) #27

I think ultimately what would be desired is for the site admin to have the option to choose how they want canonicals to be rendered for their site for topics…

Topic canonical URLs

  • Render canonical URL for topic and each reply
  • Render unique canonical for topic and every 10 replies
  • Render one canonical for each topic

I haven’t yet looked at category URLs.

The idea being that yes, this could provide additional SEO advantage over other forum applications.

To truly get the benefit, the TITLE tag should also reflect the canonical so that spiders see the titles as unique.

If each reply can be indexed, then the titles should render like this:
Topic title - Reply 2
Topic title - Reply 3
Topic title - Reply 4

If each 10 replies are indexed, then the titles should render like this:
Topic title - Page 2
Topic title - Page 3

Personally, I would opt for rendering one canonical for each topic, especially since we’re not providing different content material to the spiders. If users want to search in the topic for the specific term they had queried, they can do that once they get to the topic. If Discourse started providing search engines with distinct content per page, then I would change my opinion and go for the pagination approach.

The way things are today, I do believe that the indexing of the individual replies are diluting the SEO potential of the topics.


(Mittineague) #28

According to Google, that is mistake #1 (bold mine)

http://googlewebmastercentral.blogspot.com/2013/04/5-common-mistakes-with-relcanonical.html

Mistake 1: rel=canonical to the first page of a paginated series

Imagine that you have an article that spans several pages:

example.com/article?story=cupcake-news&page=1
example.com/article?story=cupcake-news&page=2
and so on

Specifying a rel=canonical from page 2 (or any later page) to page 1 is not correct use of rel=canonical, as these are not duplicate pages. Using rel=canonical in this instance would result in the content on pages 2 and beyond not being indexed at all.


(Jeff Atwood) #29

I think what is being requested in this topic directly contradicts what Google tells you to do.

So, all due respect, but I think we’ll go with the official Google recommendation here on paginated content :smiley:


(Charles Walter) #30

I just did some testing on
http://www.browseo.net/

I hadn’t realized Discourse was indeed getting different content indexed on the pages. That’s good. Still, it would be helpful if there were unique titles on the pages so that Google recognizes them as unique, or institute the prev/next convention.

@codinghorror, there is still a mistake on the replies. These URLs are getting indexed as duplicate content.

Example URL:
http://www.helloforos.com/t/chic-s-me-llego-la-inspiracion-hallowinesca-y-me-fui-de-compras/20920/19
Note that there is no canonical and the search engine sees the URL exactly as it would the primary topic link.

BTW, I thought these resources were insightful:


@benblackmer if you want to get rid of page 2 from the crawlers, have you considered just going into Google Webmaster tools to ingore query parameter page=2?


(Benjamin Blackmer) #31

Hey @charleswalter, I really appreciate you following up on this. Thank you! This is precisely what we’ve done.


(Mittineague) #32

Sounds like a poor idea to me.
If Google sees every page as the same URL wouldn’t if mess up the indexing?

Seems to me prev / next would be the better alternative if you think there’s a problem with the page variable. (I’m not convinced that there is a problem with the page variable)

Sorry, but this sounds like common SEO snake oil conjecture

Is there a Google source reference suggesting this?


(Charles Walter) #33

This article does make reference to Google, and they do a good job at summarizing it adding in some additional color.

Ignoring the query parameter just means that Google will ignore the uniqueness of the paginated page and focus its indexing effort on the main topic link.

My bigger concern is the thousands of other URLs being indexed for each reply URL.


(Sam Saffron) #34

The specific but of missing canonicals is going to be fixed by me today.


(Mittineague) #35

Ah, so it does
http://googlewebmastercentral.blogspot.com/2014/02/infinite-scroll-search-friendly.html

To make sure that search engines can crawl individual items linked from an infinite scroll page, make sure that you or your content management system produces a paginated series (component pages) to go along with your infinite scroll.

Be sure that if a searcher came directly to this page, they could easily find the exact item they wanted (e.g., without lots of scrolling before locating the desired content).

Discourse already does this. i.e. if someone is interest in post # 112 it will go to page 6 (posts 100 to 120)

Exactly my concern, by having everything go to page # 1 (posts 1 to 20) what happens to the posts in the rest of the topic?

AFAIK posts are not Indexed, pages are

Although the current canonical tags Discourse have work fine, it does look like using prev / next has some benefit (though possibly detriment as well) and IMHO is what you are looking for.

http://googlewebmastercentral.blogspot.com/2011/09/pagination-with-relnext-and-relprev.html

  • Consolidate indexing properties, such as links, from the component pages/URLs to the series as a whole (i.e., links should not remain dispersed between page-1.html, page-2.html, etc., but be grouped with the sequence).
  • Send users to the most relevant page/URL—typically the first page of the series.

(Kane York) #36

That’s not as good as it could be – ideally we could take you right to the post – but I have no idea how to improve it :confused:


(Mittineague) #37

Maybe the post article Ids could be used as Get variables?
Though I suspect that might mess up routing and be more trouble than it’s worth.

<article data-user-id="6626" data-post-id="143362" id="post_36" class="boxed ">


(Kane York) #38

OK, so how do you get Google to pass those when the user clicks on a result?


(Sam Saffron) #39

The original … and severe issue of missing canonicals is now fixed per:

https://github.com/discourse/discourse/commit/a0524ea4d1aaa9481a97e82f64688f1a816addfb

@charleswalter will deploy heloforos shortly with the fix.


(Charles Walter) #40

Thx @Sam. Just checked this out and the replies now have the right canonicals.

I think this will help SEO for all discourse sites and reduce the amount of indexing.