The strange thing is that the search term isn’t present on page 2. It’s found in post 12 which belongs to page 1. So, when you land on page 2 which translates to posts 20 and onwards, you don’t find what you were looking for. That’s quite confusing for users.
And I think Google gets confused too because the blog post links to post 31 via the link.
When you visit https://discourse.codinghorror.com/t/thunderbolting-your-video-card/5157/31 as Google Bot you land on a page that contains post 11 up to post 31. And the canonical wrongfully points to https://discourse.codinghorror.com/t/thunderbolting-your-video-card/5157?page=2.
I think the correct fix is to show the search bot the full page that includes the linked post. For post 31 that would be page 2 starting at post 21 and ending at max 40.
No, it doesn’t have anything to do with deleted posts. I consider this a bug.
It simply doesn’t use the correct post offset. The crawler view renders posts in pages. Post 1-20, 21-40,… If a crawler requests a certain post number, the app should render the right page. For post 31 it needs to select page 2 and render posts 21-40. Everything else results in a wrong search index.
This is the fundamental issue… we have no “correct” canonical page if we are displaying content from 2 different pages on the screen. Only way to correct this is making pages for “crawling” purpose work differently and this enters other worlds of pain.
For my blog what I do is just keep the whole chunk of comments with the blog post, eg:
But the issue described here is far more fundemental we give web crawlers a bunch of content splayed across 2 pages and then we just pick the canonical for one of the posts in the set.
One way I can think of ways of resolving this, tell google not to index “post” links eg: https://meta.discourse.org/t/google-indexed-link-not-pointing-to-the-correct-post/61443/9 is a post link, using meta tags which may force its hand to crawl the canonical and index that instead, it may work. I don’t know. Very trick problem.
Interestingly there is a far more severe issue I am noticing when I search
google indexing site:meta.discourse.org
I find these 2 broken links that we need to figure out how this even happened:
It is not really making sense how this sneaked in. My first port of call here would be to check the site map plugin to confirm it does not include these bad links AND then to confirm there is no logic where we are presenting google with content on these pages instead of an error page.