We had a huge jump in our web crawler activity today - and I was wondering if anyone else had experienced this and had any ideas on why this would happen? Seems like a lot of resources for a single activity. Is there a separate Robot.txt file for the forums?
Hard to say unless you look at logs for that day. There are a surprising number of very poorly written crawlers out there. Google’s is uniformly excellent as you would expect, but everyone else, including Microsoft (Bing) and others are kind of riding around the Internet in brightly painted clown cars, crashing into stuff.
Thanks. I’ve never looked at the logs. Its not clear to me which of these logs helps me in this task - or are you talking logs on the DIgitalOcean server?
We would be talking NGINX logs which are stored on box, analyzing them is not a trivial task, we do some of that kind of work in the performance report, at least you could pick up which IP is hitting you hard.
Yes you can turn on the auto reports via the site setting. Search settings for nginx