Our transit provider is down due to a severe hardware fault. No estimated time to recovery is available. We are bringing up read-only mirrors of standard & business customer sites at a separate location. (ref)
Would it be possible to get an entry we can put in our hosts files or some other type of workaround to connect to the other servers for update access?
Hi @EricGT - if/when we failover to another location, all changes will be automatically handled on our end. You don’t need to do anything.
(technically, we would change the DNS records of the affected sites to point to the new location)
It seems really odd that such a major service interruption is nearly invisible here.
Only 1 thread (started by a user, not an admin) with 2 postings after some sites have been down for half a day?
Why isn’t there any communication?
The site is now back up on a different server and while I have not done a detailed check is working as expected.
We have a special page you can track for these kinds of hosted-site updates:
I appreciate the Status page. But it does not provide a venue to discuss when the status update lacks clarity.
I had not dealt with a “transit provider” and the term does not Google well. (It pulls up more entries about mass transit people moving than internet packet moving.) And the “severe hardware fault” might be anything (from their backup generators running out of fuel during an extended power outage to their main building burning down.) The communication inspired anxiety and uncertainty rather than relieved it.
In the context of the internet, a transit provider is someone who carries traffic. They rarely if ever share detail as to the specifics of a failure. At best they will talk about degraded services or the relative severity of outages.
These networks are typically built with several layers of redundancy, where only compounded failure can lead to actual downtime. A few exemplars exist such as CloudFlare, who will occasionally go into painstaking details on outages, but they’re the exception and not the rule.
Well, one good thing is that this experience is making us re-examine our method for communicating with our community when there is a problem.