If you haven’t customized your robots.txt
file you won’t need to do anything… disallow
is already doing most of the work.
By default Discourse uses both disallow
and noindex
in robots.txt.
In the blog post about this update Google suggests using disallow
, which we already do. We use noindex
in addition to help avoid this linking issue Google mentions (I added emphasis to the relevant bit)…
Disallow in robots.txt: Search engines can only index pages that they know about, so blocking the page from being crawled usually means its content won’t be indexed. While the search engine may also index a URL based on links from other pages, without seeing the content itself, we aim to make such pages less visible in the future.
On our end we’ll look at making an update to add the noindex
meta tag or use the X-Robots-Tag
header in our HTTP responses to make sure Google’s not indexing the link when it appears on other pages (we’ll update this topic with any changes).
If you’ve added custom noindex
rules to robots.txt
via your /admin/customize/robots
admin page, you should change them to disallow