Keeping crawlers off a staging site

A client with a staging site just raised concern that their site could be crawled. (It would require a crawler to go looking for their domain, but if a crawler somehow stumbled on a link to the site, then the site could be crawled.)

The solutions for this, of course, are to turn on allow_index_in_robots_txt and/or login_required, but this would require remembering to reset them every time a backup from the production site is restored to the staging site.

Neither of these settings is shadowed by global.

Solutions that I have right now are to have app.yml modify site_settings.yml to add shadowed_by_global to one of those variables or to create a plugin that sets them.

Am I missing something?

2 Likes

It’s highly likely that shadowed_by_global will be replaced by “every setting can be shadowed” real soon. :crossed_fingers:

https://github.com/discourse/discourse/pull/8061

5 Likes