Malformed robots.txt causing issues with indexing

Hi everyone,

We just realised that our Discourse forum is not indexed by Google (we remember that it was indexed about a year ago), and we’re trying to fix it right now. What are the configuration that we need to make sure are set properly?

This is what I’ve done so far:

  1. I’ve made sure that “allow index in robots txt” is ticked

  2. I’ve added the following domains to “exclude rel nofollow domains”:

    • grakn.ai (our main site domain)
    • discuss.grakn.ai (our discourse forum domain)
  3. I’ve made sure that “add rel nofollow to user content” is unticked

  4. I’ve added Googlebot to “whitelisted crawler user agents”

Am I missing any other configurations that I need to set?

Our Google Search Console shows that discuss.grakn.ai could still not be crawled because it is blocked by robots.txt - see screenshot below.

Thanks in advance for the help!!!

إعجابَين (2)

Admin -> Settings -> Enable Robots.txt

Your Forum Roboy file is allowed: https://discuss.grakn.ai/robots.txt

Login to Google Webmaster Tools and check: https://www.google.com/webmasters/tools/robots-testing-tool

4 إعجابات

Out of the box with all defaults this works totally fine, did you modify these settings when you originally installed?

4 إعجابات

The robots.txt file has this text in the middle, so it might have problems with crawlers:

User-agent: *
Disallow: /
Noindex: /

Google is indexing pages though:
https://www.google.com/search?q=site%3Ahttps%3A%2F%2Fdiscuss.grakn.ai%2F&num=100

It might be that Googlebot is looking at your Google-specific rules and Webmaster Tools is warning you about the wildcard.

(I’m not sure what settings result in that robots.txt output.)

3 إعجابات

Yes.

  1. Access: https://discuss.grakn.ai/admin/customize/robots

  2. Remove:

    User-agent: *
    Disallow: /
    Noindex: /

  3. Go to Google Webmaster Tools: https://www.google.com/webmasters/tools/robots-testing-tool

Choose a verified property and submit robots.txt again to Google.

I think it should work.

إعجاب واحد (1)

Finally, removing the following block fixed the problem.

User-agent: *
Disallow: /
Noindex: /

Thank you so much, @j127 and @tohaitrieu!!!

Google Search Console now shows that discuss.grakn.ai is queued up for indexing.

Cheers!

إعجابَين (2)

I’m very unclear how you ended up in this state. Did you change default site settings related to crawling?

إعجابَين (2)

I’m also unclear how we ended up in the above state, @codinghorror. I’ve been the admin of the site for the past year and I did not change anything related to stuff above. I do remember not doing an upgrade for very long, and then did one shortly before the above issue started occurring, but I don’t know if that’s related.

إعجاب واحد (1)