Web Crawlers

Heard on annother thread @simon mention that there is a setting to stop/smash crawlers, can’t find this setting how to do this does anyone know where this is and how it works?

Had an unusual spike of 48 crawlers at my site July 14th, I don’t like these little buggers what are they doing?

Search site settings for “crawler”. You can block or slow down by user agent.

At least some index your site to have it appear in search engines. You probably do like that.

9 Likes

Also, have a look at the “Web Crawler User Agents” report to get the name of the crawler that’s causing the issue:

6 Likes

Thanks found that now, I see there are five crawler names blocked by default I guess they are known problem spiders.

Looks like there is an option to use allow-list for good crawlers, which if used automatically denies entry to all other electric bug beings, don’t know what the good ones may be?

Found the web crawler user agent reports, top ones are "YandexBox/3.0 and
CensysInspect/1.1

https://about.censys.io/

Would be good to show up on some search engines for customers need those.

Have been getting a lot of calls from marketing companies that charge subscription service for help with that and website building could be good but kind of annoying all these calls about that.

Worth noting that user agents are totally easy to spoof. If it’s Google, you can be pretty sure it will say so. But just because it says it’s Google, that means nothing.

(Same situation as robots.txt, these are mechanisms which assume trust. Untrustworthy parties can just play by different rules.)

2 Likes

Sneaky imposters even worse than weasels they can be:

"# How to check that a robot belongs to Yandex

Some robots can disguise themselves as Yandex robots by indicating the relevant User Agent. You can check the authenticity of a robot using reverse DNS lookup.

Just follow these steps:

  1. Determine the IP address of the user agent in question using your server logs.

  2. Use a reverse DNS lookup of the IP address to determine the host domain name.

  3. Check whether the host belongs to Yandex. All Yandex robot names end in yandex.ru, yandex.net or yandex.com. If the host name has a different ending, the robot does not belong to Yandex.

  4. Make sure that the name is correct. Use a forward DNS lookup to get the IP address corresponding to the host name. It should match the IP address used in the reverse DNS lookup. If the IP addresses do not match it means that the host name is fake."

There’s also a guide here you can refer to:

2 Likes

Do you happen to know if web crawlers like these index sites for voice search systems?

I get a few automated calls every day about my company not being registered with the main a.i. voice search engines, this seems to be just third party companies that do that and not sure if this is legitimate what they do.

Specifically: charging a subscription service to “register” a company on search, or aid in companies showing up on the first page of search results.

I don’t know. I doubt it.

I have interacted with some people that I think can actually help with this, some using legitimate methods. They seem to be the exception in my estimation.

1 Like

I don’t know anything about voice search either, and don’t even necessarily want for my company to ever be indexed by those.

This is important to remember especially these days, scammers are getting more sophisticated.

The calls I get often say “your google listing has been flagged for review,” which suggests it is google calling, but google never calls. Some companies do call themself a “google partner company,” not sure what that means if that is even a thing.

1 Like

Heard from one rep there are about/at least 40 different companies that do this, call start up companies to register them on voice search platforms, that explains why so many calls.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.