Keeping crawlers off a staging site

pfaffman · September 18, 2019, 3:56pm

A client with a staging site just raised concern that their site could be crawled. (It would require a crawler to go looking for their domain, but if a crawler somehow stumbled on a link to the site, then the site could be crawled.)

The solutions for this, of course, are to turn on allow_index_in_robots_txt and/or login_required, but this would require remembering to reset them every time a backup from the production site is restored to the staging site.

Neither of these settings is shadowed by global.

Solutions that I have right now are to have app.yml modify site_settings.yml to add shadowed_by_global to one of those variables or to create a plugin that sets them.

Am I missing something?

gerhard · September 18, 2019, 9:29pm

It’s highly likely that shadowed_by_global will be replaced by “every setting can be shadowed” real soon.

Topic		Replies	Views
How to disable indexing by crawlers Support	17	5124	April 24, 2019
Temporarily block Google crawler while setting up live site? Support	4	461	February 5, 2020
Menu, title collumn, and a lot a stuff are gone Support	15	1081	December 11, 2017
Pages listed in the robots.txt are crawled and indexed by Google Support	19	3244	July 30, 2019
Can we make `s3 force path style` be `shadowed_by_global`? Dev	3	739	December 7, 2018

Keeping crawlers off a staging site

Related topics