为什么 robots.txt 中有很多 Disallow 规则？

sam · 2020 年12 月 22 日 02:50

只是重新激活这个讨论。

现在，如果您愿意，可以随意编辑 robots.txt 文件。
对于不应被索引的页面，我们始终提供 x-robots-tag noindex 标签。
事实证明，如果不在 robots.txt 中提供严格指导，某些爬虫会对网站“大肆扫荡”，并非所有爬虫都像 Google 那样守规矩。如今我们的 robots.txt 文件非常基础，但这付出了代价。（我们期望所有爬虫都能像 Google 一样守规矩，但要成为 Google 需要付出巨大努力。）

我认为我们至少应该默认恢复对所有非 Googlebot 的“非常严格”的 robots.txt 设置。

话题		回复	浏览量
Excluding user profiles in robots.txt (or allow edit of file) Feature	5	2521	2014 年5 月 24 日
Needing to edit robots.txt file - where is it? Support	42	7640	2023 年4 月 29 日
Google complaining – Indexed, though blocked by robots.txt Support	24	2516	2023 年9 月 28 日
Pages listed in the robots.txt are crawled and indexed by Google Support	19	3301	2019 年7 月 30 日
Search Engine / No JavaScript version missing links Feature	8	1836	2025 年12 月 4 日