Fehlerhafte robots.txt verursacht Probleme bei der Indexierung

Hi everyone,

We just realised that our Discourse forum is not indexed by Google (we remember that it was indexed about a year ago), and we’re trying to fix it right now. What are the configuration that we need to make sure are set properly?

This is what I’ve done so far:

  1. I’ve made sure that “allow index in robots txt” is ticked

  2. I’ve added the following domains to “exclude rel nofollow domains”:

    • grakn.ai (our main site domain)
    • discuss.grakn.ai (our discourse forum domain)
  3. I’ve made sure that “add rel nofollow to user content” is unticked

  4. I’ve added Googlebot to “whitelisted crawler user agents”

Am I missing any other configurations that I need to set?

Our Google Search Console shows that discuss.grakn.ai could still not be crawled because it is blocked by robots.txt - see screenshot below.

Thanks in advance for the help!!!

Admin -> Settings -> Enable Robots.txt

Your Forum Roboy file is allowed: https://discuss.grakn.ai/robots.txt

Login to Google Webmaster Tools and check: https://www.google.com/webmasters/tools/robots-testing-tool

Out of the box with all defaults this works totally fine, did you modify these settings when you originally installed?

The robots.txt file has this text in the middle, so it might have problems with crawlers:

User-agent: *
Disallow: /
Noindex: /

Google is indexing pages though:
https://www.google.com/search?q=site%3Ahttps%3A%2F%2Fdiscuss.grakn.ai%2F&num=100

It might be that Googlebot is looking at your Google-specific rules and Webmaster Tools is warning you about the wildcard.

(I’m not sure what settings result in that robots.txt output.)

Yes.

  1. Access: https://discuss.grakn.ai/admin/customize/robots

  2. Remove:

    User-agent: *
    Disallow: /
    Noindex: /

  3. Go to Google Webmaster Tools: https://www.google.com/webmasters/tools/robots-testing-tool

Choose a verified property and submit robots.txt again to Google.

I think it should work.

Finally, removing the following block fixed the problem.

User-agent: *
Disallow: /
Noindex: /

Thank you so much, @j127 and @tohaitrieu!!!

Google Search Console now shows that discuss.grakn.ai is queued up for indexing.

Cheers!

I’m very unclear how you ended up in this state. Did you change default site settings related to crawling?

I’m also unclear how we ended up in the above state, @codinghorror. I’ve been the admin of the site for the past year and I did not change anything related to stuff above. I do remember not doing an upgrade for very long, and then did one shortly before the above issue started occurring, but I don’t know if that’s related.