Amazonbot 滥用爬行

Hi there, I wanted to report some aggressive crawling by the bot with the user agent

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)

It seems to be a bot by amazon but I couldn’t check the originating IP addresses to confirm that.

This is what the last 5 days look like:
crawler3

For comparison, this our user agents table for the last two days. 39649 vs 457

I personally don’t care too much about this as we’re not the ones doing the hosting and we haven’t noticed performance issues but CDCK is. So I figured this could be interesting to share here.

3 个赞

Can we double check this @dax?

2 个赞

From our site and container logs it appears that there was a spike only that particular day and only on that site

May 1st:

Client IP Amazonbot*
107.23.182.118 3,560
54.90.49.0 3,210
35.175.129.27 3,204
3.80.18.217 2,646
35.153.79.214 2,529
34.201.164.175 2,432
107.21.55.67 1,959
34.204.61.165 1,538
18.208.120.81 1,473
100.25.191.160 1,276

* Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)

2 个赞

I see. Thanks for checking it. Probably a technical user, having a bad day and making a trashy bot to target our website with no effect. We’ve since blocked that crawler.

1 个赞

既然我自己也刚遇到类似的事情……

我很高兴 Alexa 能够使用我的网站内容来回答问题,所以我并不想阻止它。但是,我刚刚看到来自 AmazonBot 的三天大量流量(相对于所有其他网站使用情况,包括所有其他机器人以及所有其他网站整体流量),并且我看到亚马逊说:

AmazonBot 不支持 robots.txt 中的 crawl-delay 指令

因此,将 Amazonbot 添加到 slow_down_crawler_user_agents 中似乎是明智的,这样它们就不会对用户网站性能产生过大的影响。

感谢 Discourse 的各位,实现了爬虫本应有但在此情况下没有的功能。:heart:

2 个赞