Adding Canonical Redirects for SEO Optimization

I apologize if there is an easy solution to this problem but I have searched this forum to no avail. I am trying to optimize our new Discourse forum for SEO, and it seems that multiple pages containing duplicate content can be accessed at different URLs, thus hurting search engine ranking by “splitting” the traffic for each duplicated page. So imagine we have some content at:

forum.foo.com/c/uncategorized

The issue is that this same exact page can also be accessed at:

forum.foo.com/c/uncategorized/l/latest?category_id=1&page=1

This means we need to add a canonical redirect from the second URL to the first by putting the tag ’ link rel=“canonical” href=“http://forum.foo.com/c/uncategorized” ’ in the page header when ever the URL “forum.foo.com/c/uncategorized/l/latest?category_id=1&page=1” gets loaded.

Is there any built-in support to do this in Discourse? This issue seems to really be hurting our search engine rankings as just about every category page has duplicate URLs at something like */l/latest?category_id=1&page=1. I do not mind doing some minor tinkering in the ruby-rails backend to get this done, but we would prefer not to dive into any complex hacks.

Google is usually smart enough to understand that query strings can produce identical content with a near infinite number of variables used in said query strings.

Where is your proof that this is “hurting rankings”? Do you have data? There is a lot of snake oil in the SEO “industry”.

1 Like

What is your URL so we can see this in action with the site and inurl operators?

You can manually specify how you want Google to interpret url parameters using Webmaster Tools: http://googlewebmastercentral.blogspot.com/2012/08/configuring-url-parameters-in-webmaster.html

As Jeff says though, for most sites I leave that setting at “Let Google decide how to interpret the url parameters” and seems just fine.

However, this doesn’t apply when two pages have similar content but different urls (not counting the url parameters). For example foo.com/category1 and foo.com/category1/latest will be seen as different pages, regardless of how you tweak the url parameters settings in WMT. The OP is correct that it’d be best to specify a cannonical URL for any two pages that have distinct urls and identical or nearly identical content.

2 Likes

OK that seems fair… @techapj can you add to your list, for all categories pages make sure that

https://meta.discourse.org/c/support/l/latest

has a rel="canonical" meta tag pointing to

https://meta.discourse.org/c/support

(Be sure this is in the crawler version of the page too.)

I suppose there is a minor hole here if you have the categories page set as the homepage in topnav, etc, but I’m ignoring that for now.

4 Likes

An example of 2 URLs with duplicate content are:

http://forum.learntomod.com/c/uncategorized/l/latest?category_id=1&page=1

and

http://forum.learntomod.com/c/uncategorized

Thankyou so much! Glad to hear an update is in the works.

Implemented via

https://github.com/discourse/discourse/pull/3235#issuecomment-75987533

4 Likes

Confirmed it looks good, in

https://meta.discourse.org/c/support/l/latest

I see

<link href="https://meta.discourse.org/c/support" rel="canonical" />

7 Likes

I have seen that the platform have a few seo navigation and crawl issues. Here are the problems

Point 1. You’ve implemented rel=‘next’/‘prev’ but not as a meta tag but as a item prop. You’ve put in canonical only in Topics but not properly in Categories.
feature - Discourse Meta & feature - Discourse Meta both have the same canonical <link rel="canonical" href="https://meta.discourse.org/c/feature" /> whereas it should be different for both the pages.

Point 2. NOSCRIPT URL : Plugin for signatures? & Plugin for signatures?

PushState URL: Plugin for signatures? & Plugin for signatures?

Point 3. Refer point 1 answer

2 Likes

Everything is implemented correctly per Google Webmaster guides. This was already covered in previous topics.

https://support.google.com/webmasters/answer/1663744?hl=en

Please cite specific words and sentences that are not being correctly met there, on that page.

Searchers commonly prefer to view a whole article or category on a single page. Therefore, if we think this is what the searcher is looking for, we try to show the View All page in search results. You can also add a rel=“canonical” link to the component pages to tell Google that the View All version is the version you want to appear in search results.

See above. “VIEW ALL”. That would be the category root page…

3 Likes

Jeff,

What you say is right, but the problem is that there is no “VIEW ALL” page in discourse forums. I am citing examples of even this forum.

Let us take Feature category, the URI is feature - Discourse Meta and if you look at it through user-agent as Googlebot/Bingbot/Slurp! you can see the page is broken into pagination feature - Discourse Meta then page=2 and so on upto page=72 containing 30 links each but all of these pages have the same canonical: <link rel="canonical" href="https://meta.discourse.org/c/feature" /> so you are leaving out all the other 71 pages from being indexed.
You can even check out the Google Cache if you don’t believe me at feature - Discourse Meta
or site: command site:https://meta.discourse.org/c/feature - Google Search

But the same does not hold true for Topics pages where the canonical changes for each page. And this is the correct way to do this.

And if you are at it, can we keep the same pagination style in both NOSCRIPT as well as PushState().

Sorry to keep disturbing if you are finding this annoying.

1 Like

Okay, so here’s the problem: Category pages have incorrect canonical: the page parameter is not included.

3 Likes

That’s what I was telling

What you’re saying is that there is no canonical possible in this case. The better solution, if that’s true, is to not render it at all.

Mistake 1: rel=canonical to the first page of a paginated series

Imagine that you have an article that spans several pages:

Specifying a rel=canonical from page 2 (or any later page) to page 1 is not correct use of rel=canonical, as these are not duplicate pages. Using rel=canonical in this instance would result in the content on pages 2 and beyond not being indexed at all.

So your advice was incorrect. The correct thing to do, per Google webmaster guidelines, is not to render canonical at all on paginated content.

@techapj can you make sure that’s the case in every common scenario? It is definitely the case on /latest (homepage).

2 Likes

Yeah, Jeff

Right on the money. Either you remove the canonical tag altogether or you keep it for all pages. Both have their problems

If you remove the canonical tags all filtered parameters (if filter is used like sort etc) will get indexed which can create duplicate issues.

If you keep the canonical the problem is that it has to change for every page which can be a technical headache.

1 Like

Okay, I just made the change so that canonical tag is not present on paginated category and topic pages.


https://github.com/discourse/discourse/commit/ecd93f7efb98c41e79077d025c2215c98f1c912d

3 Likes

Wait a moment removing canonical from topic is a terrible mistake, I am ok to remove from latest/etc

But removing from topic means that Google is going to have discrete results for every post in a topic, this is terrible on so many levels

1 Like

Follow a discussion with Sam on this I’ve reverted this commit so we can rethink and do it properly.

This stuff need extremely careful consideration

  • do we want users entering discourse sites on a category filter page?
  • do we want users entering discourse sites on a category filter page on page 100
  • do we want users to get a hit on a “list” style page in the expense of hitting the right topic?
  • do we want users entering on top page (probably yes)
  • is a site map desirable to increase crawling efficiency ?

Having the same canonical for all the pages on the list stuff for non topics heavily deemphasises them as search results, something that is desirable

I wonder if we should even allow robots to index any of the filters except for latest.

You can get to every topic on the site through latest, the fact we allow all this slice and diced crawling does make crawling activity much less effective, as Google keeps on rediscovering the same content over and over

We simply need to analyze our logs first and see how big the problem is, there is huge appeal in decreasing crawling load and increasing crawling efficiency it makes all sites faster and better

But we need to be ultra careful here not to cause any unwanted side effects that take months to rectify

6 Likes