About 1/3 of our traffic is from crawlers (about 250K last month). Is there a way to block these but allow Google’s crawlers?
Maybe you should edit your robots.txt file with:
User-agent: Google Disallow: User-agent: * Disallow: /
Source: The Web Robots Pages
Thank you @SidV! That put me on track
Though, now I run into my limited understanding of Discourse, where can I add this? It seems there is not robot.txt I can edit.
IIRC, there is an outlet you can use in a plugin. Let me see if I can find it.
and here is an example of using it in a plugin:
Thanks! I already found those but as a non-programmer that doesn’t help me.
Neither does the How do I install a plugin? explanation.
I guess I’m looking for an edit-discourse-101…
Going from the Sitemap plugin, you need a similar plugin.rb file (bare bones though)
# name: Disable Bots # about: # version: 1.0 # authors: whoever # url: https://github.com/your_github_account/your_repo_name PLUGIN_NAME = "discourse-disable-bots".freeze enabled_site_setting :disable_bots_enabled
A config/settings.yml file
plugins: disable_bots_enabled: default: false
en: site_settings: disable_bots_enabled: 'Enable Disable Bots'
<%- if SiteSetting.sitemap_enabled? %> <%- unless SiteSetting.login_required? %> Disallow: / User-agent: Google Disallow: <% end %> <% end %>
For the last one, I’m not sure if that will work or not. The thought is it will be appending
Disallow: / under the already existing
User-agent: *, but that could be a wrong assumption and you may have to specify it again, such as
<%- if SiteSetting.sitemap_enabled? %> <%- unless SiteSetting.login_required? %> User-agent: * Disallow: / User-agent: Google Disallow: <% end %> <% end %>
It’s no so easy.
If you choose that way (edit the robots.txt) you should contact your hosting and edit the @cpradio’s files he said. And please take this as a warning, if you upgrade discourse later, all the modifications will be gone, and you must do it again after you update/upgrade discourse.
Sorry for my intrusion but why are you worried about the consumption of robots on your website?
Well the traffic from the crawlers makes us exceed our plan so I was looking for an easy solution.
But it seems there is none haha.
Thank you for your input though!
Would it be easier to just make the forum login required?
(search bots don’t log in)
That blocks everyone, and for that we have a better suited setting:
allow index in robots txt.
He wants to block all but a specific bot, and that’s not easily done.