Issues Google Search Console is throwing at me for wrong discourse structure (or some for wrong administration of my site)

After the passing of initial hysteria (which can overtake a person who finds that for the past nearly half year, google wasn’t paying any attention to his website and he didn’t even know about this!!),

I’d try to list here only those causes, which are related to Discourse code, i.e. which are fixable neither by Google, nor by me (afa I can say), but by Discourse team.

When clicked on ‘Video Indexing Report’:

And when clicked on ‘Enhancements> Videos’ link:

Pls help.

Earlier when I happen to find the problem (all this can be skipped)::

After several months of loosing users, today I found that my site has been blocked/forbiden to Google !!!

More Earlier, how I found this problem, all can be skipped:

I did read almost all of this meta topic before asking.

In Aug2024, I noted that traffic to my website has reduced up to 95%. But I ignored it thinking that may be I wasn’t posting enough.

Bt today I found that whatever term I might search on google, restricting the search to my own site only: ***site:BathindaHelper.com jobs in bathinda*** , it was giving ZERO result (the single result it is showing from my site is actually just a suggestion that I create google ads to show this result from my site, which indicates that my site HAS indeed been indexed):


And finally I also checked Google Analytics (perpahs renamed to Google Events) and it s clearly showing that from 17 June, 2024 Google isn’t referring my site any more.

When you made your site either using forced login or stopped to show topics TL0+? Google can use a site only if it is visible to world. Or you have blocked user agents of Google.

Is this the same forum where you have DNS issues?

2 Likes

Are you asking because of Site does not appear in google searches - #2 by Bathinda? I think the reply below is answering the question of the op.

1 Like

I wrongly used the word ‘force’. (I was meaning to say that I was forcing Google search to produce search results from my own site BathindaHelper.com )

  • I didn’t create my site using any abnormal/force method.
  • I didn’t deliberately tinkered anything related to TL0+ or related.
  • For past half hour, I’ve found that (among some other one-two small issues) somehow my robots.txt file is the culprit, but I’ve not been able to find (yet) how to fix this.
  • I don’t remember having DNS issues (are you talking about too long past?). My site is working ok, except that when I/admin hard refresh my browser, then sometimes it takes nearly 30 to 50 seconds to open, but after that it works ok.

Thanks for replying.

Edit:
I’ve ‘deselected’ robots file option:

but I can’t say if google search console is reporting that all is ok or not now:

Yeah, I missed totally order. And now we got a demonstration what can happen when

  • answered old topics
  • off topics
  • an user doesn’t read topics :joy:

Yes, my bad.

2 Likes

Check out this settings:

  • allowed crawler user agents
  • blocked crawler user agents

But AFAIK Discourse hasn’t plain robots.txt per se as most of sites have, but it is done by some strange ruby-thingy, and there isn’t too many settings where an admin can adjust it. Except those two settings, and slowing down bots.

That was just me and my fast fingers :man_facepalming:

1 Like

Did you disable that now or before indexing stopped?

Specify in robots.txt that this site is allowed to be indexed by web search engines.

If you do not permit search engines to index your site, it does not surprise me that they don’t.

3 Likes

Would do and report.

I disabled this after opening this topic (say 30 min before now). While this problem has been there for 3 months. But I’ve not been independently able to verify has this ‘deselection’ been able to fix ‘Google Indexing’ fault or not.

I’m in doubt if I don’t disable/block sites by Robots.txt, then are ALL SITES ARE ALLOWED ? Or is it contrary, that if I don’t ENABLE sites by Robots.txt then all sites are BLOCKED from indexing?

I forgot totally that. You should select it. If you don’t use that then you must check and edit robots.txt manually to be sure it guides bots as you want.

But you can take look if you find there something that would stop Google.

1 Like

Ok.
That means all discourse users (normally) would need to specify/give ‘Robots.txt’ file.
And so, I’d read the topic about this (how and what should be there in this file) in detail tomorrow.

2nd, if its not too big to explain, can you tell any easy way with which I could tinker some settings in my Discourse Admin panel and at the same time check live/realtime if Google is now able to access (and then index) my site freely or is it still getting ‘Access Forbidden - 403’ error ?!

Edit: Though I myself would try to find similar resources on google now/later.

Well, no. It means that normally admins keep robots.txt enabled to avoid manual tinkering :wink: But sure, blocked bots list etc. are what an admin wants modify.

2 Likes

Can you check what your setting for blocked_crawler_user_agents is?

1 Like
  1. This setting is like below (i didn’t change anything):

  2. Here I wrote these two domains google and google.com yesterday, as an experiment, I don’t know this takes priority over ‘Blocked Crawler User Agents’ or not. Or whether this has fixed my problem or not (because google is saying that it has queued my crawling/index request, which might take up to 2-3 days):

  3. And you can find my 'Robots.txt’ here.

Kindly tell which takes priority if all 3 have contradictory settings.

That shouldn’t have an effect, since Google uses “Googlebot” and variations thereof for crawling:

3 Likes

Indeed that had the main effect!!

Thanks you all, big thanks for helping me resolveding the main big issue, by using this setting:

But for so many other (small) issues affecting Google indexing, explained by me in the first post of this very meta topic, I’d like to keep the topic open.

Also, I’d be obliged if someone could tell what if I’ve blocked Crawer-1 of a site under blocked Crawler User Agents and at the same time allow the same under Allowed Crawler User Agents.
And what if I’ve allowed it under Allowed... but blocked it thru Robots.txt. What DOES take priority.

You must remove compatible. It blocks practically everything, including googlebot. Because of this:

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

And blocking facebook isn’t that good idea either, if you share topics in Facebook.

Everything you use in blocklist, blocks every bots that have that word in their use agent string. So, be wake.

3 Likes

Oh, may be that’s why I’m still getting error while trying to crawl/index any topics (except home page) thru Google Search Console:

But why (even when `compatible’ was blocked) only home page is being available to Google Search Console, as shown below:

I just removed that ‘Compatible’ and would report back.

Finally!!! Seem to have overcome ‘forbidden’ error for main/home page and individual topics, wih 90% help from yourside and 10% experimenting on my side. Big thank you.

After removing ‘Compatible’ from ‘Blocked Crawlers’ list, I found a note under another setting, which, stupid of me to ignore, was essentially asking the users not to fill any value in ‘Allowed Crawler User Agents’ unless you’re pretty sure what you’re doing. So here it was! Ignoring the Warning written in Caps brought me so many months of Google ignoring my site and so much trouble:


For anyone coming to this topic for Access Forbidden-403 error in Google Search Console:

  • Mainly 2 things solved my problems, one removing ‘Compatible’ from ‘Blocked Crawlers List’ and
  • Emptying (as it is by default) the setting ‘Allowed User Crawler Agents’.

Topic will remain open for other G Search issues (though so much critical as this was).

1 Like