Why isn't Google Indexing Discourse? SEO concerns

Yep :slight_smile:

This seems to be an issue then. Coz we definitely can’t continue if google doesn’t index individual posts coz it sorta defeats the entire aim of having the community (since most folks will discover content from google - including our current users, instead of going through the categories on discourse).

And this doesn’t seem to be an issue just on our discourse, but even this forum (meta.discourse.org).

To showcase this, I googled the title of this post, and this is the result I got:

Basically for folks who end up using the discourse community as a knowledge base (which ofcourse many companies do), this becomes a big issue.

We’ve followed pretty much most of the guidelines given in the 2 blogs on SEO that were shared above, our content is rich, detailed, highly technical. Yet when people google for it, it doesn’t show up.

So genuine questions guys: Given our use case (which is creating a knowledgebase via support queries, which can also help in SEO), and seeing the issue at hand, would you recommend that we just start planning on finding an alternative?

1 Like

here is an extract from Google

The indexing of your content by Google is determined by system algorithms that take into account user demand and quality checks.

Also remember that only TL3 and above users has the no follow links disabled.

Here is something to read

In other words, my posts has more “weight” than your posts.

my apologies, you right :+1:

1 Like

@constantine It looks like Google is prefering pages on the sub-domain https://docs.appsmith.com over pages on https://community.appsmith.com

Most search results are from docs.appsmith.com
site:appsmith.com - Google Search

A shot in the dark: Googles preference for one sub-domain over the orther might be influenced by “Core Web Vitals” Largest Contentful Paint (LCP)
The simulated report on https://pagespeed.web.dev/ is “bugged” by Discourse. Better check the Google Search Console report “Experinece” → “Core Web Vitals” for your domain.

Just an update folks, we self-hosted discourse and submitted the sitemap to search console and now our content is being indexed by google. So perhaps something is up with the cloud hosted version?

I am not sure this is related to sitemaps or cloud hosting. Meta is hosted on AWS which is a completely different place to where we host many of our other customers and we started seeing very uneven results for meta lately and quite a few sites across various hosting options.

I have been trying to tune a few things to see if anything helps.

  • We no longer follow links to .rss which saves google from scanning /1 /2 etc variants of a topic that all share a canonical.

  • We explicitly tell Google not to follow links inside the .rss feed in case it gets an rss feed.

  • I temporarily disabled some canonical tuning we did - which showed promise: Search engines now blocked from indexing non-canonical pages

The symptom I am observing here on meta is that

  1. Google is indeed crawling ALL the content, I can see that in the weblogs
  2. Despite crawling the pages on 50% or so of recent new meta topics are not showing up in the index.

This is extremely concerning, Google is giving us very little visibility of “why?” here.

My next step is to get more data and an on going report going we will probably use serpapi to figure out which pages are missing from Google and try to figure out a pattern.

5 Likes

Because Meta, meaning OPs here, doesn’t use ”why” as Google wants? I did some expriences and even the context stays same, using different sentences changes search results. As expected.

And… personal search history is big and not so good thing. My results are quite often pointing somewhere else than to Meta, because I’m doing searches here, not via Google.

Yes. Still Discourse may have some issues with Google, and that may or may not come from ”forum status” where Google isn’t reacting to forums same way as to ordinary website. Or there is some technically issues — even that is the reason quite rarely.

It’s the same thing I did, after an initial period of improvement then it returns to being ignored.
After two months now even the new threads in the sitemap are not being indexed, it is a situation that has been going on for months and we have lost a lot of views.

We want to try this for one month and if it doesn’t improve we will change software :sob:

Check Google Search Console report “Index” → “Coverage” for the “missing” pages. This might not show you “why” but “what is going on”.

  1. Open “Index” → “Coverage” for your domain.

  2. Select "All submitted pages
    grafik

  3. See especially “Excluded”

Description of the different stages: Index Coverage report - Search Console Help

My assumption on Google: “Crawled - currently not indexed” has some upper limit set by Google - while there are too many pages in this state, new pages will only very slowly enter the Google index.


On Google Crawler Performance see Google Search Console report “Settings” → “Crawl stats”

Here especially the timeline and development of the “average response time” is interesting: faster response = more crawl requests

And also “By purpose” is interesting:
grafik

My issue is, still, noindex-tag. Google is bypassing all aswers in topics because of noindex, but those things users are googling — and indexing covers just questions.

Well, that can be fixed.

I also noticed that google, unlike bing, ignores the title of the discussion and I don’t understand why. This never happens with my other forum on SMF…

Errors that appeared on Bing Webmaster Tools this morning, I’ve been using it for a week on a new site.

The 12 pages reported have a meta description of 230 characters.

Could it be a problem for Google too?

I’m not sure it matters too much?

Tidying up titles and making them more accurate, readable and clickable is probably more important.
Also educate your users so they won’t mind you editing the title so it contains more keywords from the topic.
If you say that cleaning up the title and adding keywords makes searches more likely to find and popularise their topics, they won’t mind.

1 Like

I absolutely agree.
However, the error refers to the length of the meta description…

Write to Microsoft.
Ask them to stop kicking out High severity errors when its easier to either modify their bot or do a simple string truncate after 150 chars before saving to their DB?

Waste of good email. Bing can’t handle simple 301 redirections and is knocking over 5 years old 404s because of that.

Banning Bing saves resources and not harms anybodys SEO :rofl:

(And yes, I know we are doing off topic big time…)

1 Like

How does one remove the noindex tag?

Can it be done this way?

Or using plugin, as one dev told me — for me she’s asking to high price, though. Or using Varnish instead?

I really don’t know anything else that is told to me. All I know for sure is noindex is right now harming SEO on my forum.

I updated discourse every 3 days and I am noticing an improvement in searches, in some cases the threads are indexed after only 3 hours.
The same thing seems to me to be happening here (for example this one indexed after 3 hours), the most important thing is that now the words contained in the title are finally displayed in the search result.

Great job!
I hope it continues like this…

5 Likes

This topic prompted us to look into the issue and @sam made lots of improvements over the last couple of weeks. We’ve been closely monitoring how many newly created topics here on Meta are missing from the Google index. That number has been decreasing every day.

WOW :star_struck: … stuff is getting better every day! Not a single missing topic on the week of the 13th!!!

15 Likes