禁用或绕过 Googlebot 的功能检测（同时为爬虫提供 JS 应用）

stance455 · 2022 年5 月 16 日 15:59

我开始认为我的逻辑从一开始就是错误的。这可以解释为什么没有人回应——也许根本没什么问题。

这是一篇关于谷歌在截图中显示空白页是正常现象的新文章：

我现在可以看到主页的“抓取”HTML，这是索引版本，而不是来自“实时测试”——它显示了完整的页面。请记住，谷歌在为他们提供完整的 JS 应用时就弄明白了这一点。

有趣的是，就索引而言，他们向下滚动到了主页上的大约第 27 篇文章。所以无限滚动是谷歌能够理解的东西。

我不确定这是否有帮助，但我取消了管理员设置中的 ajax 选项。这导致谷歌找到了如下所示的 URL（并提供了爬虫版本）——我取消了它，现在该 URL 将显示 JS 版本：

https://discuss.flynumber.com/t/japan-phone-numbers-disconnect-notice/2351?_escaped_fragment_=

话题		回复	浏览量
Googlebot is getting non-javascript version of the site Development	16	1689	2024 年3 月 9 日
[Paid work] Allow search engines to crawl the production JS-version Marketplace	5	976	2022 年4 月 4 日
Can we have a conversation about SEO? Development	2	867	2022 年4 月 4 日
How difficult would it be to make Discourse JS google compliant? Development	3	857	2022 年4 月 5 日
Google May 4th Core Update impact on Discourse forums Community Building	102	9788	2022 年7 月 5 日