Why isn't Google Indexing Discourse? SEO concerns

I am not sure why Google isn’t indexing the content of our discourse community.

Look at this page here: How do I rewrite my entire frontend with AppSmith? - How do I do X? - Appsmith

The title is fairly unique. I’d imagine, when I copy and paste the title in Google, the first link should be the URL above.

Instead what I get is this: Not only is it not the first link, it doesn’t show up at all.

What am I doing wrong?

2 Likes

There could be soooooo many reason for this.

Is the googlebot actually crawls your site ? check mysite.com/admin/reports/web_crawlers

Is the googlebot blocked or rate limited? check mysite.com/admin/site_settings/category/security?filter=crawler%20user%20agents

Did you add your site to Google Search Console ?

Self-hosted people can even install the following plug-in to help

2 Likes

The Sitemap plugin is available on our Business and Enterprise plans as well.

Probably nothing. For some reason Google seems to give the “How do I do X” a higher priority than the actual topic with that exact title. Why? I’m not sure. It might just be an AI making that decision based on unknown patterns.

6 Likes

Aka Google Patterns that no one knows :wink:

1 Like

Well that’s a big issue then for us. We’ll look into this, but it sorta defeats the whole purpose then, coz we can’t expect people to “go through” every topic in the community. In practice, most people will google for their issue (even if they’re a member of the community) to arrive at the answer.

2 Likes

It’s really hard to tell what’s going on with Google in your case. I took the liberty of taking a closer look at your crawler stats and on first glance it looks like the Google crawler isn’t visiting your community very often. Did you already try to gain insights from the Google Search Console? Maybe there’s some information there that could help in getting to the bottom of this.

I’m seeing the same behavior for a small portion of recently created topics here on Meta. I’ll discuss this with our team in order to find out if there’s something we can do or if it’s simply Google being Google. I’ll keep you updated.

Also, did you see our two blog posts on SEO?

2 Likes

This google behavior started months ago and sadly is getting worse.

The sitemap in my case didn’t help …

1 Like

Yeah this is the first question I’d ask.

4 Likes

Well…our site is appsmith.com, so all subdomains and subfolders of appsmith.com would ideally be indexed when we submit appsmith.com to GSC right?

I have gone ahead and added community.appsmith.com (our discourse forum) to GSC anyways today, but not sure if that’s going to change anything.

I would add the subdomain as a standalone property in your GSC and submit a dedicated sitemap for the subdomain.

1 Like

@constantine

Your forum is being indexed since May 2021

image

Yep :slight_smile:

This seems to be an issue then. Coz we definitely can’t continue if google doesn’t index individual posts coz it sorta defeats the entire aim of having the community (since most folks will discover content from google - including our current users, instead of going through the categories on discourse).

And this doesn’t seem to be an issue just on our discourse, but even this forum (meta.discourse.org).

To showcase this, I googled the title of this post, and this is the result I got:

Basically for folks who end up using the discourse community as a knowledge base (which ofcourse many companies do), this becomes a big issue.

We’ve followed pretty much most of the guidelines given in the 2 blogs on SEO that were shared above, our content is rich, detailed, highly technical. Yet when people google for it, it doesn’t show up.

So genuine questions guys: Given our use case (which is creating a knowledgebase via support queries, which can also help in SEO), and seeing the issue at hand, would you recommend that we just start planning on finding an alternative?

1 Like

here is an extract from Google

The indexing of your content by Google is determined by system algorithms that take into account user demand and quality checks.

Also remember that only TL3 and above users has the no follow links disabled.

Here is something to read

In other words, my posts has more “weight” than your posts.

my apologies, you right :+1:

1 Like

@constantine It looks like Google is prefering pages on the sub-domain https://docs.appsmith.com over pages on https://community.appsmith.com

Most search results are from docs.appsmith.com
site:appsmith.com - Google Search

A shot in the dark: Googles preference for one sub-domain over the orther might be influenced by “Core Web Vitals” Largest Contentful Paint (LCP)
The simulated report on https://pagespeed.web.dev/ is “bugged” by Discourse. Better check the Google Search Console report “Experinece” → “Core Web Vitals” for your domain.

Just an update folks, we self-hosted discourse and submitted the sitemap to search console and now our content is being indexed by google. So perhaps something is up with the cloud hosted version?

I am not sure this is related to sitemaps or cloud hosting. Meta is hosted on AWS which is a completely different place to where we host many of our other customers and we started seeing very uneven results for meta lately and quite a few sites across various hosting options.

I have been trying to tune a few things to see if anything helps.

  • We no longer follow links to .rss which saves google from scanning /1 /2 etc variants of a topic that all share a canonical.

  • We explicitly tell Google not to follow links inside the .rss feed in case it gets an rss feed.

  • I temporarily disabled some canonical tuning we did - which showed promise: Search engines now blocked from indexing non-canonical pages

The symptom I am observing here on meta is that

  1. Google is indeed crawling ALL the content, I can see that in the weblogs
  2. Despite crawling the pages on 50% or so of recent new meta topics are not showing up in the index.

This is extremely concerning, Google is giving us very little visibility of “why?” here.

My next step is to get more data and an on going report going we will probably use serpapi to figure out which pages are missing from Google and try to figure out a pattern.

5 Likes

Because Meta, meaning OPs here, doesn’t use ”why” as Google wants? I did some expriences and even the context stays same, using different sentences changes search results. As expected.

And… personal search history is big and not so good thing. My results are quite often pointing somewhere else than to Meta, because I’m doing searches here, not via Google.

Yes. Still Discourse may have some issues with Google, and that may or may not come from ”forum status” where Google isn’t reacting to forums same way as to ordinary website. Or there is some technically issues — even that is the reason quite rarely.

It’s the same thing I did, after an initial period of improvement then it returns to being ignored.
After two months now even the new threads in the sitemap are not being indexed, it is a situation that has been going on for months and we have lost a lot of views.

We want to try this for one month and if it doesn’t improve we will change software :sob:

Check Google Search Console report “Index” → “Coverage” for the “missing” pages. This might not show you “why” but “what is going on”.

  1. Open “Index” → “Coverage” for your domain.

  2. Select "All submitted pages
    grafik

  3. See especially “Excluded”

Description of the different stages: Index Coverage report - Search Console Help

My assumption on Google: “Crawled - currently not indexed” has some upper limit set by Google - while there are too many pages in this state, new pages will only very slowly enter the Google index.


On Google Crawler Performance see Google Search Console report “Settings” → “Crawl stats”

Here especially the timeline and development of the “average response time” is interesting: faster response = more crawl requests

And also “By purpose” is interesting:
grafik