Removing the /2, /3, /4, etc links for each reply within a topic URL

No, /8 is not the same as the topic. /8 points to the 8th post and the timestamp corresponds to that of the 8th post.

If you compare the ?page=2 variant to the actual post it links to then you will get the same timestamps.
For instance:

wget -q -O - https://meta.discourse.org/t/topic-list-previews-legacy/101646/959|grep published_ti
<meta property="article:published_time" content="2020-05-09T04:29:46+00:00" />
wget -q -O - https://meta.discourse.org/t/topic-list-previews-legacy/101646/?page=2|grep published_ti
<meta property="article:published_time" content="2020-05-09T04:29:46+00:00" />

Looks like it: Incorrect or failing oneboxes for links to other discourse instances - #14 by techAPJ

3 Likes

I’m not saying to remove time information, but just that it would be better to only send the machine-readable timestamp for the top post. From the perspective of ranking a page in search results, a forum topic is basically an article (top post) with a bunch of comments on it. It doesn’t matter to a search engine when the comments were made.

Edit: another way of passing the date to Google for a comment (as opposed to the entire page) is schema.org markup.

Sure, /8 points to the 8th post, but from a bot’s perspective and from Google’s perspective, it’s the exact same content and URL. If you want Google to know that /8 should be treated the exact same way as the topic in the search results, then the site probably shouldn’t send an intentional signal that they are different. Only the human user needs to know that the timestamps are different, and that information is printed in the text on the page.

If someone at Google has to make decisions about when to override site-defined canonical URLs, one of those exceptions could be something like “two different timestamps in the intentional metadata means different pages – therefore override the canonical URL.”

It’s often hard for programmers to think of all the edge cases unless they have experience with encountering that thing, so it might be inconceivable to the Google programmers that identical pages could have two different timestamps, even though it’s easy for Discourse users to understand why that might happen.

I used to work at a company where part of my job was to get sites unbanned from Google. (They weren’t doing anything shady, but there were just technical problems.) Since no one knew exactly how Google’s ranking tech works, and it changes regularly, the starting place was to try to think like a Search engineer and remove anything that could possibly be ambiguous or confusing to machines. I could never say exactly which thing worked, but it always worked after some time of systematically fixing things like that.

5 Likes

This is in. If you want to enable this experimental feature, you need to flip the value to the hidden site setting SiteSetting.allow_indexing_non_canonical_urls.

Please share the results with us.

6 Likes

Makes perfect sense to me.

Yes, yes, and yes. Well articulated.

3 Likes