Removing the /2, /3, /4, etc links for each reply within a topic URL

No, /8 is not the same as the topic. /8 points to the 8th post and the timestamp corresponds to that of the 8th post.

If you compare the ?page=2 variant to the actual post it links to then you will get the same timestamps.
For instance:

wget -q -O - https://meta.discourse.org/t/topic-list-previews-legacy/101646/959|grep published_ti
<meta property="article:published_time" content="2020-05-09T04:29:46+00:00" />
wget -q -O - https://meta.discourse.org/t/topic-list-previews-legacy/101646/?page=2|grep published_ti
<meta property="article:published_time" content="2020-05-09T04:29:46+00:00" />

Looks like it: Incorrect or failing oneboxes for links to other discourse instances - #14 by techAPJ

3 Likes

I’m not saying to remove time information, but just that it would be better to only send the machine-readable timestamp for the top post. From the perspective of ranking a page in search results, a forum topic is basically an article (top post) with a bunch of comments on it. It doesn’t matter to a search engine when the comments were made.

Edit: another way of passing the date to Google for a comment (as opposed to the entire page) is schema.org markup.

Sure, /8 points to the 8th post, but from a bot’s perspective and from Google’s perspective, it’s the exact same content and URL. If you want Google to know that /8 should be treated the exact same way as the topic in the search results, then the site probably shouldn’t send an intentional signal that they are different. Only the human user needs to know that the timestamps are different, and that information is printed in the text on the page.

If someone at Google has to make decisions about when to override site-defined canonical URLs, one of those exceptions could be something like “two different timestamps in the intentional metadata means different pages – therefore override the canonical URL.”

It’s often hard for programmers to think of all the edge cases unless they have experience with encountering that thing, so it might be inconceivable to the Google programmers that identical pages could have two different timestamps, even though it’s easy for Discourse users to understand why that might happen.

I used to work at a company where part of my job was to get sites unbanned from Google. (They weren’t doing anything shady, but there were just technical problems.) Since no one knew exactly how Google’s ranking tech works, and it changes regularly, the starting place was to try to think like a Search engineer and remove anything that could possibly be ambiguous or confusing to machines. I could never say exactly which thing worked, but it always worked after some time of systematically fixing things like that.

5 Likes

This is in. If you want to enable this experimental feature, you need to flip the value to the hidden site setting SiteSetting.allow_indexing_non_canonical_urls.

Please share the results with us.

8 Likes

Makes perfect sense to me.

Yes, yes, and yes. Well articulated.

3 Likes

See

9 Likes

Right now Google is correctly using the canonical URLs:
We can supervise this via Google Search Console with the report ‘Index’ → ‘Coverage’ → ‘Alternate page with proper canonical tag’

About Alternate page with proper canonical tag:
“This page is a duplicate of a page that Google recognizes as canonical. This page correctly points to the canonical page, so there is nothing for you to do.” :slight_smile:

4 Likes

I have no idea how the /X links for each reply affect SEO, and I generally try to avoid pandering to Google’s whims. But from a practical standpoint I’m seeing that Google is not picking up new replies in many long-running topics on my Discourse forum, whereas it does quickly index most new topics. And when it does index a new reply the link doesn’t go to the specific reply but rather to /XXXX?page=YY. I have no idea if that’s good for SEO, but it’s definitely not good for human users that are searching for something specific.

This topic has been silent for quite a while. I was curious: has anyone tested this experimental feature? Now that over two years have passed, I’d love to know if this is still considered an experiment or if anyone can confirm that it fixes the issue?

Similar to what @RGJ had done back in Nov '21, I found a large public forum (Python) that uses Discourse and I did a Google search for a topic in their forum with many replies to see if it would show a bunch of individual replies from the same topic.

To my delight, Google did NOT show me a large list of individual replies in the results! The only results were the topic itself and then the category it lives inside! This is a GREAT sign!

Although, when I do the same search that @RGJ did back in Nov '21, the problem still exists with that specific search.

I also ran a new test search with another topic on this Discourse community forum, and found a similar problem, with multiple results that were coming from the same topic.

It’s great to see this problem doesn’t always exist with all Discourse forums… but I don’t understand why the problem would be resolved with the Python forum while it still exists in the Discourse forum.

Does anyone have any ideas on how to make this problem go away?

I’m considering migrating an existing forum from NodeBB to Discourse, but before I do, I need to know there’s a way to resolve so it doesn’t create an SEO nightmare for our domain.

4 Likes

That search returns a small number of links into the topic, but the topic has 58 posts so you’d expect to see 58 individual results if the /nn URLs were all being indexed. It’s possible that the spider is seeing links to posts in the topic in other posts so it indexes those individual pages?

Having said that, turning off /nn would be a nightmare for my forum. There are often long discussions about how to resolve issues which might contain multiple, this seems to work, followed a few posts later by a “oh no, it doesn’t” post. We often refer back to actual “fix” posts when someone else has that problem in future. If all you can do it point people to a page that contains the answer somewhere on it and that quite possibly contains incorrect solutions that’s not going to help anyone.

And, yes, there might be Discourse ways to highlight solutions, e.g. the Solved plugin, but my forum has 22 years of posts where only the last 12 months were made in Discourse.

3 Likes

Hey Seth!
I’m currently facing the same issue on my project.
I have multiple urls for a single page due to it being paginated.

I think that this post can be helpful.
I managed to use this code to redirect all my paginated pages into their canonical page.

You put that code in an .htaccess file to redirect pages in Discourse?

Discouse doesn’t use Apache2. It can be used front of Discourse as a reverse proxy, but is far away optimum in that.

And I don’t understand this topic at all. That url structure has nothing to do with SEO. But perhaps the reason is I don’t understand — but my forum has yet quite high SEO value, but it comes from the content.

3 Likes

I think the problem here is the crawl budget.

No, that either.