Add option to set canonical_url to embed_url

From time to time we get requests to set the canonical URL of embedded topics to the URL of the blog post. I’ve created a pull request that does exactly that. It unconditionally uses the URL of the original blog post (embed_url) as canonical for the topic.

There have been various previous discussions like Google indexed link not pointing to the correct post and Duplicate Content in the past.

After reading those posts I’m not so sure about my solution anymore, so I’d like to get some feedback from you.

  • Should this be a configurable? Is there a good reason for keeping the current behavior of always using the topic’s URL as canonical?

  • Should the blog post’s URL only be used as canonical for N pages presented to the search bot? After all, only a certain amount of posts is embedded in the blog post. (N probably should be 1)

I’d appreciate your feedback on this. I’m sure there are lots of different use cases out there and I’d like to make an informed decision before I change anything that could affect search engines.

7 Likes

My thought is that if you are copying and re-posting content from a Blog post for any reason conversation or not, the original blog post is the original content and should be pointed to canonically as the original content.

Yes. And by default don’t enable the blog post’s URL as canonical. Let the customer set the switch. Otherwise this is going to change a lot of Search referral traffic all of a sudden.

IMHO, only the blog post linking top post should be made canonical. The responses and follow ups should not be.

1 Like

With the WordPress plugin, sites can choose between publishing an excerpt, or publishing the full post to Discourse. Sites that are only publishing an excerpt might not want the canonical URL set to the blog post.

IMO it should be a per host setting here, default off:

3 Likes

That’s not possible. Discourse presents topics as paginated content to crawlers. That’s why I suggested to change only the canonical of the first page.

Yeah, I’m going to make this a per host setting.
@simon Will this work for the Wordpress plugin as well?

Yes, that should work. When a post is published from WordPress it creates a TopicEmbed on Discourse, with the embed_url set to the post’s permalink.

We just got to be careful here… this is a very sharp instrument. If for example wordpress is in “Top N” mode where it show only the best content we can end up setting a canonical to a page that does not have all the overlapping content, this is terrible signal to search engines and can be penalised heavily.

In fact, the whole “collapsing” of OP may make this a bad idea, the OP really should be a complete duplicate of the canonical page, so we may need a different technique there that collapses on client side.

I would not rush anything here.

3 Likes

Howdy folks :wave:

I originally wanted to weigh in here and join in the calls for this feature, but after diving in a bit deeper I wanted to share what I learned about how this works (in case anyone missed it like I did initially!)

We’ve just embedded Discourse as the comment system for our blog and I had a little mini freak out when I clicked the “Show full post…” button and saw the whole blog copied without the correct canonical URL :flushed:

After taking a few deep breaths I went into my “debug mode” and started checking the straight HTML response and checked how much of the post is actually there. As it turns out only the initial paragraph is included in the HTML and therefore this is all Google will see. Phew!

Having a second look at it, it makes perfect sense in the way the UX is laid out. I’m assuming the reason it’s hidden behind a button is because you want people to be able to read the full post and not affect SEO :+1:

I guess initially I was surprised that that “Show full Post…” wasn’t just a link to the original blog :thinking: but I guess it’s an OK way to do it :joy:

9 Likes

This feature has now been implemented with the embed set canonical url site setting. That setting is disabled by default. When enabled, it sets the canonical URL for embedded topics to the embedded content’s URL.

The feature has existed for a while now. I’d be curious to hear from any sites that have enabled it about how it has affected their SEO ranking.

6 Likes

Hey @simon, I was struggling to find for a solution to set canonical URLs on selected topics in my community when I ran into this post.

It seems that this setting might offer a solution but I don’t understand what are “embedded topics”. I’ve tried looking for it on this community but I couldn’t find any explainers. Maybe this is something very basic. But will you be able to tell me what are embedded topics or how to embed topics in a Discourse community?

An embedded topic is a topic that has its embed_url property set to the URL of an external site. I am only aware of this being done when topics are published to Discourse via the API. For example, the Discourse WordPress plugin and the Discourse javascript embed code both create embedded topics.

If you are publishing your topics to Discourse from an external site, this approach would make sense. You won’t be able to use this approach for topics that are created directly on Discourse though.

3 Likes

So that can to be used if we change the title of some blog posts (with date of updating for SEO purposes, for example) and/or to not have duplicated content?

We really need that because we are using embed content from Drupal and it’s the first time that I come to that thread :neutral_face: