Adding Canonical Redirects for SEO Optimization

adrian2 · February 23, 2015, 1:08am

I apologize if there is an easy solution to this problem but I have searched this forum to no avail. I am trying to optimize our new Discourse forum for SEO, and it seems that multiple pages containing duplicate content can be accessed at different URLs, thus hurting search engine ranking by “splitting” the traffic for each duplicated page. So imagine we have some content at:

forum.foo.com/c/uncategorized

The issue is that this same exact page can also be accessed at:

forum.foo.com/c/uncategorized/l/latest?category_id=1&page=1

This means we need to add a canonical redirect from the second URL to the first by putting the tag ’ link rel=“canonical” href=“http://forum.foo.com/c/uncategorized” ’ in the page header when ever the URL “forum.foo.com/c/uncategorized/l/latest?category_id=1&page=1” gets loaded.

Is there any built-in support to do this in Discourse? This issue seems to really be hurting our search engine rankings as just about every category page has duplicate URLs at something like */l/latest?category_id=1&page=1. I do not mind doing some minor tinkering in the ruby-rails backend to get this done, but we would prefer not to dive into any complex hacks.

codinghorror · February 23, 2015, 1:11am

Google is usually smart enough to understand that query strings can produce identical content with a near infinite number of variables used in said query strings.

Where is your proof that this is “hurting rankings”? Do you have data? There is a lot of snake oil in the SEO “industry”.

sigurdur · February 23, 2015, 1:20am

What is your URL so we can see this in action with the site and inurl operators?

jeffwidman · February 24, 2015, 12:41am

You can manually specify how you want Google to interpret url parameters using Webmaster Tools: Official Google Webmaster Central Blog: Configuring URL Parameters in Webmaster Tools

As Jeff says though, for most sites I leave that setting at “Let Google decide how to interpret the url parameters” and seems just fine.

However, this doesn’t apply when two pages have similar content but different urls (not counting the url parameters). For example foo.com/category1 and foo.com/category1/latest will be seen as different pages, regardless of how you tweak the url parameters settings in WMT. The OP is correct that it’d be best to specify a cannonical URL for any two pages that have distinct urls and identical or nearly identical content.

codinghorror · February 24, 2015, 12:45am

OK that seems fair… @techapj can you add to your list, for all categories pages make sure that

https://meta.discourse.org/c/support/l/latest

has a rel="canonical" meta tag pointing to

https://meta.discourse.org/c/support

(Be sure this is in the crawler version of the page too.)

I suppose there is a minor hole here if you have the categories page set as the homepage in topnav, etc, but I’m ignoring that for now.

adrian2 · February 24, 2015, 8:17pm

An example of 2 URLs with duplicate content are:

http://forum.learntomod.com/c/uncategorized/l/latest?category_id=1&page=1

and

http://forum.learntomod.com/c/uncategorized

adrian2 · February 24, 2015, 8:26pm

Thankyou so much! Glad to hear an update is in the works.

techAPJ · February 25, 2015, 4:16pm

Implemented via

https://github.com/discourse/discourse/pull/3235#issuecomment-75987533

codinghorror · February 26, 2015, 9:19am

Confirmed it looks good, in

https://meta.discourse.org/c/support/l/latest

I see

<link href="https://meta.discourse.org/c/support" rel="canonical" />

Amarjeet · September 28, 2015, 9:32am

I have seen that the platform have a few seo navigation and crawl issues. Here are the problems

Point 1. You’ve implemented rel=‘next’/‘prev’ but not as a meta tag but as a item prop. You’ve put in canonical only in Topics but not properly in Categories.
feature - Discourse Meta & feature - Discourse Meta both have the same canonical <link rel="canonical" href="https://meta.discourse.org/c/feature" /> whereas it should be different for both the pages.

Point 2. NOSCRIPT URL : Plugin for signatures? & Plugin for signatures?

PushState URL: Plugin for signatures? & Plugin for signatures?

Point 3. Refer point 1 answer

codinghorror · September 28, 2015, 10:18am

Everything is implemented correctly per Google Webmaster guides. This was already covered in previous topics.

https://support.google.com/webmasters/answer/1663744?hl=en

Please cite specific words and sentences that are not being correctly met there, on that page.

Searchers commonly prefer to view a whole article or category on a single page. Therefore, if we think this is what the searcher is looking for, we try to show the View All page in search results. You can also add a rel=“canonical” link to the component pages to tell Google that the View All version is the version you want to appear in search results.

See above. “VIEW ALL”. That would be the category root page…

Amarjeet · September 29, 2015, 6:22am

Jeff,

What you say is right, but the problem is that there is no “VIEW ALL” page in discourse forums. I am citing examples of even this forum.

Let us take Feature category, the URI is feature - Discourse Meta and if you look at it through user-agent as Googlebot/Bingbot/Slurp! you can see the page is broken into pagination feature - Discourse Meta then page=2 and so on upto page=72 containing 30 links each but all of these pages have the same canonical: <link rel="canonical" href="https://meta.discourse.org/c/feature" /> so you are leaving out all the other 71 pages from being indexed.
You can even check out the Google Cache if you don’t believe me at feature - Discourse Meta
or site: command site:https://meta.discourse.org/c/feature - Google Search

But the same does not hold true for Topics pages where the canonical changes for each page. And this is the correct way to do this.

And if you are at it, can we keep the same pagination style in both NOSCRIPT as well as PushState().

Sorry to keep disturbing if you are finding this annoying.

riking · September 29, 2015, 6:25am

Okay, so here’s the problem: Category pages have incorrect canonical: the page parameter is not included.

Amarjeet · September 29, 2015, 6:26am

That’s what I was telling

codinghorror · September 29, 2015, 9:03am

What you’re saying is that there is no canonical possible in this case. The better solution, if that’s true, is to not render it at all.

Mistake 1: rel=canonical to the first page of a paginated series

Imagine that you have an article that spans several pages:

example.com/article?story=cupcake-news&page=1

example.com/article?story=cupcake-news&page=2

and so on

Specifying a rel=canonical from page 2 (or any later page) to page 1 is not correct use of rel=canonical, as these are not duplicate pages. Using rel=canonical in this instance would result in the content on pages 2 and beyond not being indexed at all.

So your advice was incorrect. The correct thing to do, per Google webmaster guidelines, is not to render canonical at all on paginated content.

@techapj can you make sure that’s the case in every common scenario? It is definitely the case on /latest (homepage).

Amarjeet · September 29, 2015, 10:03am

Yeah, Jeff

Right on the money. Either you remove the canonical tag altogether or you keep it for all pages. Both have their problems

If you remove the canonical tags all filtered parameters (if filter is used like sort etc) will get indexed which can create duplicate issues.

If you keep the canonical the problem is that it has to change for every page which can be a technical headache.

techAPJ · September 29, 2015, 12:13pm

Okay, I just made the change so that canonical tag is not present on paginated category and topic pages.

sam · September 29, 2015, 9:11pm

Wait a moment removing canonical from topic is a terrible mistake, I am ok to remove from latest/etc

But removing from topic means that Google is going to have discrete results for every post in a topic, this is terrible on so many levels

eviltrout · September 29, 2015, 9:24pm

Follow a discussion with Sam on this I’ve reverted this commit so we can rethink and do it properly.

sam · September 29, 2015, 9:46pm

This stuff need extremely careful consideration

do we want users entering discourse sites on a category filter page?
do we want users entering discourse sites on a category filter page on page 100
do we want users to get a hit on a “list” style page in the expense of hitting the right topic?
do we want users entering on top page (probably yes)
is a site map desirable to increase crawling efficiency ?

Having the same canonical for all the pages on the list stuff for non topics heavily deemphasises them as search results, something that is desirable

I wonder if we should even allow robots to index any of the filters except for latest.

You can get to every topic on the site through latest, the fact we allow all this slice and diced crawling does make crawling activity much less effective, as Google keeps on rediscovering the same content over and over

We simply need to analyze our logs first and see how big the problem is, there is huge appeal in decreasing crawling load and increasing crawling efficiency it makes all sites faster and better

But we need to be ultra careful here not to cause any unwanted side effects that take months to rectify

Topic		Replies	Views
Google indexing same page multiple times: Issue with canonicals Support	8	1523	June 28, 2023
Removing the /2, /3, /4, etc links for each reply within a topic URL Dev seo	33	4061	October 13, 2024
?page= sometimes redirects to a page with a different canonical URL Bug	3	712	October 19, 2020
Negative SEO because of too much category pages getting indexed Support	9	1675	July 5, 2017
Search engines now blocked from indexing non-canonical pages Announcements seo	23	4152	March 15, 2022

Adding Canonical Redirects for SEO Optimization

Related topics