Why isn't Google Indexing Discourse? SEO concerns

I am not sure this is related to sitemaps or cloud hosting. Meta is hosted on AWS which is a completely different place to where we host many of our other customers and we started seeing very uneven results for meta lately and quite a few sites across various hosting options.

I have been trying to tune a few things to see if anything helps.

  • We no longer follow links to .rss which saves google from scanning /1 /2 etc variants of a topic that all share a canonical.

  • We explicitly tell Google not to follow links inside the .rss feed in case it gets an rss feed.

  • I temporarily disabled some canonical tuning we did - which showed promise: Search engines now blocked from indexing non-canonical pages

The symptom I am observing here on meta is that

  1. Google is indeed crawling ALL the content, I can see that in the weblogs
  2. Despite crawling the pages on 50% or so of recent new meta topics are not showing up in the index.

This is extremely concerning, Google is giving us very little visibility of “why?” here.

My next step is to get more data and an on going report going we will probably use serpapi to figure out which pages are missing from Google and try to figure out a pattern.

5 Likes