Need a way to add "meta robots noindex" in topics from a category

Continuing the discussion from SEO for Thin Content or Modify Meta Tags:

I’m struggling with the same problem, here.

I’m using WP-Discourse and it is great! But for every new blog post, it creates a topic with the exact same title in my community. Two URLs with the same title is not a good thing, since it steals relevance from each other in search results.

Then the comments from the topic are also printed below the blog posts, which generates duplicated content (same content across multiple URLs).

Both are huge SEO problems, that could lead to a domain penalisation.

How to fix this?

The solution would be a simple checkbox in the category configuration box:

[ ] Hide Topics from this category in search results.

When the checkbox is marked, a noindex tag would be inserted in the header of all the pages related to it: the category itself, topics, pagination, etc.

<meta name=“robots” content=“noindex, dofollow”>

This way, everything is still there to the users, but ignored by search engines.


Things that doesn’t fix the problem

Let me go a few steps ahead, and address some common responses. I saw a few topics about this issue, and they all had suggestions that doesn’t actually fix the problem.

Robots.txt

The most common solution presented, is to add a “disallow: /c/category/id” in robots.txt. But this would only remove the category itself from the search results and not the topics, which is the main problem here.

The URL structure of the topics are all the same, so we can’t block them by simply adding a “disavow” line in robots.txt

Ex:

site.com/t/title-of-the-topic/id

Unlist topics

A unlisted topic is still visible to search engines. It will be hidden in the community listings, but you can still access the topic if you have the direct link. And we need to send users to the topics, so we add a link to it in the blog post. So the search engines will also find all the unlisted topics.

Notice that nofollowing this link won’t make googlebot ignore it: Official Google Webmaster Central Blog: Evolving “nofollow” – new ways to identify the nature of links

At the same time, unlisting the topic leads to a reduction in user engagement, because the users won’t be able to jump from one topic to another inside the community.

So this idea doesn’t solve anything. It leads to a reduction in engagement, while not hiding the topics from search engines at all.

Require login to see the topics in that category

When a new user clicks in the comment button, he/she will se a “This page doesn’t exists” message, instead of the topic. The user thinks something is broken and then leaves the site. So no comments and no new user registration. Very bad for engagement and usability.


In conclusion, it would be very useful to have this option added to Discourse, or if someone could develop a simple plugin.

It needs to be added to the core, or googlebot will ignore the javascript.

The SEO guys would much appreciate it!

3 Likes

This doesn’t directly answer your question, but Discourse has a new embed set canonical url site setting that might help you. When that setting is enabled, the canonical URL of Discourse topics that have been created through the WP Discourse plugin or through the Discourse javascript embed code will be set to the URL of the associated blog post.

4 Likes

That does help, thank you!

I didn’t knew about this new feature.

One question, though:

I’m not very familiar about how embed works, besides wp-discourse. If a user creates a topic pointing to other internal links, will it be set as canonical?

1 Like

No, if a user creates a topic by pasting a URL into the composer’s title field, a featured link will be created. This does not cause the featured link URL to be set as the canonical URL.

When the embed set canonical url site setting is enabled, topics that have an associated topic_embed will have their canonical URL set to the topic_embed’s URL. The Discourse javascript embed code does this automatically. It can also be done by creating a topic through the API and passing an embed_url property. This is how our WordPress plugin works.

3 Likes

Hi @simon and team - good thread! I am facing the same issue here for my community (https://community.americanradioclub.com/). I would like to automatically post to discourse, but for some (if not all) posts from wordpress, I want to set a no-index on the discourse discussion in order to avoid duplicate content and be penalized in SEO. Has anyone found a good solution to this? Thanks!

2 Likes