How to block all crawlers but Google's


About 1/3 of our traffic is from crawlers (about 250K last month). Is there a way to block these but allow Google’s crawlers?


Maybe you should edit your robots.txt file with:

User-agent: Google

User-agent: *
Disallow: /

Source: The Web Robots Pages


Thank you @SidV! That put me on track :+1:

Though, now I run into my limited understanding of Discourse, where can I add this? It seems there is not robot.txt I can edit.

(cpradio) #4

IIRC, there is an outlet you can use in a plugin. Let me see if I can find it.


and here is an example of using it in a plugin:

Needing to edit robots.txt file - where is it?

Thanks! I already found those but as a non-programmer that doesn’t help me.

Neither does the How do I install a plugin? explanation.

I guess I’m looking for an edit-discourse-101…

(cpradio) #6

Going from the Sitemap plugin, you need a similar plugin.rb file (bare bones though)

# name: Disable Bots
# about:
# version: 1.0
# authors: whoever
# url:

PLUGIN_NAME = "discourse-disable-bots".freeze

enabled_site_setting :disable_bots_enabled

A config/settings.yml file

    default: false

A config/locales/server.en.yml

    disable_bots_enabled: 'Enable Disable Bots'

A app/views/connectors/robots_txt_index/sitemap.html.erb

<%- if SiteSetting.sitemap_enabled? %>
<%- unless SiteSetting.login_required? %>
Disallow: /

User-agent: Google
<% end %>
<% end %>

For the last one, I’m not sure if that will work or not. The thought is it will be appending Disallow: / under the already existing User-agent: *, but that could be a wrong assumption and you may have to specify it again, such as

<%- if SiteSetting.sitemap_enabled? %>
<%- unless SiteSetting.login_required? %>
User-agent: *
Disallow: /

User-agent: Google
<% end %>
<% end %>


It’s no so easy.
If you choose that way (edit the robots.txt) you should contact your hosting and edit the @cpradio’s files he said. And :warning: please take this as a warning, if you upgrade discourse later, all the modifications will be gone, and you must do it again after you update/upgrade discourse.

Sorry for my intrusion but why are you worried about the consumption of robots on your website?


Well the traffic from the crawlers makes us exceed our plan so I was looking for an easy solution.

But it seems there is none haha.

Thank you for your input though!

(Mittineague) #9

Would it be easier to just make the forum login required?
(search bots don’t log in)

(Rafael dos Santos Silva) #10

That blocks everyone, and for that we have a better suited setting: allow index in robots txt.

He wants to block all but a specific bot, and that’s not easily done.