Heard on annother thread @simon mention that there is a setting to stop/smash crawlers, can’t find this setting how to do this does anyone know where this is and how it works?
Had an unusual spike of 48 crawlers at my site July 14th, I don’t like these little buggers what are they doing?
Thanks found that now, I see there are five crawler names blocked by default I guess they are known problem spiders.
Looks like there is an option to use allow-list for good crawlers, which if used automatically denies entry to all other electric bug beings, don’t know what the good ones may be?
Found the web crawler user agent reports, top ones are "YandexBox/3.0 and
CensysInspect/1.1
Would be good to show up on some search engines for customers need those.
Have been getting a lot of calls from marketing companies that charge subscription service for help with that and website building could be good but kind of annoying all these calls about that.
Worth noting that user agents are totally easy to spoof. If it’s Google, you can be pretty sure it will say so. But just because it says it’s Google, that means nothing.
(Same situation as robots.txt, these are mechanisms which assume trust. Untrustworthy parties can just play by different rules.)
Sneaky imposters even worse than weasels they can be:
"# How to check that a robot belongs to Yandex
Some robots can disguise themselves as Yandex robots by indicating the relevant User Agent. You can check the authenticity of a robot using reverse DNS lookup.
Use a reverse DNS lookup of the IP address to determine the host domain name.
Check whether the host belongs to Yandex. All Yandex robot names end in yandex.ru, yandex.net or yandex.com. If the host name has a different ending, the robot does not belong to Yandex.
Make sure that the name is correct. Use a forward DNS lookup to get the IP address corresponding to the host name. It should match the IP address used in the reverse DNS lookup. If the IP addresses do not match it means that the host name is fake."
Do you happen to know if web crawlers like these index sites for voice search systems?
I get a few automated calls every day about my company not being registered with the main a.i. voice search engines, this seems to be just third party companies that do that and not sure if this is legitimate what they do.
Specifically: charging a subscription service to “register” a company on search, or aid in companies showing up on the first page of search results.
I have interacted with some people that I think can actually help with this, some using legitimate methods. They seem to be the exception in my estimation.
This is important to remember especially these days, scammers are getting more sophisticated.
The calls I get often say “your google listing has been flagged for review,” which suggests it is google calling, but google never calls. Some companies do call themself a “google partner company,” not sure what that means if that is even a thing.
Heard from one rep there are about/at least 40 different companies that do this, call start up companies to register them on voice search platforms, that explains why so many calls.