Google indexed link not pointing to the correct post

Quintin_Par · April 22, 2017, 6:19pm

I did a search for this topic, which is a comment on CodingHorror’s March 2017 post

The first link Google points to, is the blogpost and the second one to the forum/comment discussion/post.

When I click on the link it takes me to another post that is not the one I searched for. The one I searched for is this.

sam · April 26, 2017, 7:13pm

something about the google indexing here is weird. the canonical on that page is:

<link rel="canonical" href="https://discourse.codinghorror.com/t/thunderbolting-your-video-card/5157?page=2">

so I am not even sure how that is showing up there.

codinghorror · April 29, 2017, 2:08am

My google results from your link are in this order

https://blog.codinghorror.com/thunderbolting-your-video-card/
https://discourse.codinghorror.com/t/thunderbolting-your-video-card/5157/9
https://discourse.codinghorror.com/t/thunderbolting-your-video-card/5157/21
https://discourse.codinghorror.com/t/thunderbolting-your-video-card/5157

The actual correct result would be post number 12…

https://discourse.codinghorror.com/t/thunderbolting-your-video-card/5157/12

… so 9 and 21 are a bit off but certainly “on the same page” ish.

Still it is odd to search for a whole quoted paragraph plus, verbatim. in Google.

sam · April 29, 2017, 1:02pm

For your specific case I wonder if canonical should be the parent blog at least for all the comments that render on parent blog.

I also wonder about adding an option to redirect crawlers to the correct page vs canonical.

For a 100% embedded case you want the search to always hit the parent blog except for super rare cases

codinghorror · April 29, 2017, 8:18pm

The actual search in this case is a bit bizarre so I am not cnfortable basing an entire philosophy on a sample size of one.

Quintin_Par · May 3, 2017, 1:26am

I saw this bug on another forum and wanted an example to showcase, hence the search by a paragraph.

Quintin_Par · May 3, 2017, 1:29am

Please don’t change the current functionality to a blogpost canonical. Here’s the reason:

https://meta.discourse.org/t/how-can-i-get-google-to-index-all-responses-and-comments-as-new-url-endpoints-or-pages/

gerhard · December 10, 2018, 4:31pm

For the record, this is how the search results look for me right now:

https://blog.codinghorror.com/thunderbolting-your-video-card/
https://discourse.codinghorror.com/t/thunderbolting-your-video-card/5157?page=2

The strange thing is that the search term isn’t present on page 2. It’s found in post 12 which belongs to page 1. So, when you land on page 2 which translates to posts 20 and onwards, you don’t find what you were looking for. That’s quite confusing for users.

And I think Google gets confused too because the blog post links to post 31 via the link.

When you visit https://discourse.codinghorror.com/t/thunderbolting-your-video-card/5157/31 as Google Bot you land on a page that contains post 11 up to post 31. And the canonical wrongfully points to https://discourse.codinghorror.com/t/thunderbolting-your-video-card/5157?page=2.

I think the correct fix is to show the search bot the full page that includes the linked post. For post 31 that would be page 2 starting at post 21 and ending at max 40.

Mittineague · December 10, 2018, 6:32pm

Might this involve the discrepancy between visible and deleted posts count? i.e. deleted posts still retain their post id value.

gerhard · December 10, 2018, 7:19pm

No, it doesn’t have anything to do with deleted posts. I consider this a bug.

It simply doesn’t use the correct post offset. The crawler view renders posts in pages. Post 1-20, 21-40,… If a crawler requests a certain post number, the app should render the right page. For post 31 it needs to select page 2 and render posts 21-40. Everything else results in a wrong search index.

sam · December 11, 2018, 1:33am

This is the fundamental issue… we have no “correct” canonical page if we are displaying content from 2 different pages on the screen. Only way to correct this is making pages for “crawling” purpose work differently and this enters other worlds of pain.

For my blog what I do is just keep the whole chunk of comments with the blog post, eg:

https://www.google.com.au/search?q=“One+commonly+overlooked+impedance+to+development+flow+is+typos”

But the issue described here is far more fundemental we give web crawlers a bunch of content splayed across 2 pages and then we just pick the canonical for one of the posts in the set.

One way I can think of ways of resolving this, tell google not to index “post” links eg: https://meta.discourse.org/t/google-indexed-link-not-pointing-to-the-correct-post/61443/9 is a post link, using meta tags which may force its hand to crawl the canonical and index that instead, it may work. I don’t know. Very trick problem.

Interestingly there is a far more severe issue I am noticing when I search

google indexing site:meta.discourse.org

I find these 2 broken links that we need to figure out how this even happened:

This on the second page:

https://meta.discourse.org/t/google-complaining-indexed-though-blocked-by-robots-txt/96408?page=2

This on the third

https://meta.discourse.org/t/canonical-tag-generated-with-page-2/32842?page=4

It is not really making sense how this sneaked in. My first port of call here would be to check the site map plugin to confirm it does not include these bad links AND then to confirm there is no logic where we are presenting google with content on these pages instead of an error page.

Topic		Replies	Views
Pagination URL scheme not passed through when topic is renamed Feature	22	3948	May 20, 2015
Search engines now blocked from indexing non-canonical pages Announcements seo	23	4170	March 15, 2022
?page= sometimes redirects to a page with a different canonical URL Bug	3	716	October 19, 2020
Canonical tag on topic URL Bug	23	2421	February 7, 2017
Removing the /2, /3, /4, etc links for each reply within a topic URL Dev seo	33	4101	October 13, 2024

Google indexed link not pointing to the correct post

Related topics