After the passing of initial hysteria (which can overtake a person who finds that for the past nearly half year, google wasn’t paying any attention to his website and he didn’t even know about this!!),
I’d try to list here only those causes, which are related to Discourse code, i.e. which are fixable neither by Google, nor by me (afa I can say), but by Discourse team.
In Aug2024, I noted that traffic to my website has reduced up to 95%. But I ignored it thinking that may be I wasn’t posting enough.
Bt today I found that whatever term I might search on google, restricting the search to my own site only: ***site:BathindaHelper.com jobs in bathinda*** , it was giving ZERO result (the single result it is showing from my site is actually just a suggestion that I create google ads to show this result from my site, which indicates that my site HAS indeed been indexed):
And finally I also checked Google Analytics (perpahs renamed to Google Events) and it s clearly showing that from 17 June, 2024 Google isn’t referring my site any more.
When you made your site either using forced login or stopped to show topics TL0+? Google can use a site only if it is visible to world. Or you have blocked user agents of Google.
I wrongly used the word ‘force’. (I was meaning to say that I was forcing Google search to produce search results from my own site BathindaHelper.com )
I didn’t create my site using any abnormal/force method.
I didn’t deliberately tinkered anything related to TL0+ or related.
For past half hour, I’ve found that (among some other one-two small issues) somehow my robots.txt file is the culprit, but I’ve not been able to find (yet) how to fix this.
I don’t remember having DNS issues (are you talking about too long past?). My site is working ok, except that when I/admin hard refresh my browser, then sometimes it takes nearly 30 to 50 seconds to open, but after that it works ok.
But AFAIK Discourse hasn’t plain robots.txt per se as most of sites have, but it is done by some strange ruby-thingy, and there isn’t too many settings where an admin can adjust it. Except those two settings, and slowing down bots.
I disabled this after opening this topic (say 30 min before now). While this problem has been there for 3 months. But I’ve not been independently able to verify has this ‘deselection’ been able to fix ‘Google Indexing’ fault or not.
I’m in doubt if I don’t disable/block sites by Robots.txt, then are ALL SITES ARE ALLOWED ? Or is it contrary, that if I don’t ENABLE sites by Robots.txt then all sites are BLOCKED from indexing?
I forgot totally that. You should select it. If you don’t use that then you must check and edit robots.txt manually to be sure it guides bots as you want.
But you can take look if you find there something that would stop Google.
Ok.
That means all discourse users (normally) would need to specify/give ‘Robots.txt’ file.
And so, I’d read the topic about this (how and what should be there in this file) in detail tomorrow.
2nd, if its not too big to explain, can you tell any easy way with which I could tinker some settings in my Discourse Admin panel and at the same time check live/realtime if Google is now able to access (and then index) my site freely or is it still getting ‘Access Forbidden - 403’ error ?!
Edit: Though I myself would try to find similar resources on google now/later.
Well, no. It means that normally admins keep robots.txt enabled to avoid manual tinkering But sure, blocked bots list etc. are what an admin wants modify.
Here I wrote these two domains google and google.com yesterday, as an experiment, I don’t know this takes priority over ‘Blocked Crawler User Agents’ or not. Or whether this has fixed my problem or not (because google is saying that it has queued my crawling/index request, which might take up to 2-3 days):
But for so many other (small) issues affecting Google indexing, explained by me in the first post of this very meta topic, I’d like to keep the topic open.
Also, I’d be obliged if someone could tell what if I’ve blocked Crawer-1 of a site under blocked Crawler User Agents and at the same time allow the same under Allowed Crawler User Agents.
And what if I’ve allowed it under Allowed... but blocked it thru Robots.txt. What DOES take priority.
Finally!!! Seem to have overcome ‘forbidden’ error for main/home page and individual topics, wih 90% help from yourside and 10% experimenting on my side. Big thank you.
After removing ‘Compatible’ from ‘Blocked Crawlers’ list, I found a note under another setting, which, stupid of me to ignore, was essentially asking the users not to fill any value in ‘Allowed Crawler User Agents’ unless you’re pretty sure what you’re doing. So here it was! Ignoring the Warning written in Caps brought me so many months of Google ignoring my site and so much trouble: