Two months ago I mentioned the impact of the 2003 Cedar wildfires (http://blog.overlandstorage.com/archive/2007/08/30/when-the-sky-falls.aspx) and recommended using generic scenarios in your disaster and BC planning - for example:
• loss of services in our datacenter (for example Acts of Man, power, telecom, cooling etc)
• loss of access to our datacenter (as happened during the Cedar fire)
• loss of the datacenter facility (for example localized fire or flood)
• loss of the site (for example things that fall from the sky)
• regional disaster (for days when the sky is falling, the ground shaking or both)
Many of our suppliers, customers and even a few press have asked how Overland was impacted by the recent firestorm - and this time was very different from 2003 (further proving that a specific 'firestorm' plan would have been pointless). The first of our plans - getting the designated emergency team in contact with each other, locating and ensuring the safety of our employees, checking on the direct threat to our offices and locating and ensuring the security of the most recent backups - all took place in the early hours of Monday morning, while the Witch fire raced towards the City and new fires were reported across the County. Unlike the Cedar fires there was no direct threat to our headquarters this time - but with 650,000 people evacuated over the following two days and as many again affected by travel restrictions few of our employees (and at times only one person from the entire IT team) could even access the site: cue the 'loss of access' plans. Since the Cedar fires we've continued to expand our 'lights out' management capabilities so we were able to run the datacenter remotely including if needed power outlet level control and console access to key equipment, being able to remotely diagnose network issues, having deployed remote control tools for user support and being able to monitor the datacenter's physical environment.
With so many of our employees stuck at home or evacuated our remote access usage jumped to three times the normal level (distributed of course across redundant VPN gateways and internet providers, all hail SSL VPNs with endpoint security clients, terminal servers and burstable metro ethernet internet service) - local ISPs were moderately degraded due to high load, but the VPN was already tuned to 'tolerate' the poor connection quality common with wireless data cards. On a related note credit is due to certain enlightened vendors - I'll mention Juniper but there may be others - that offer 'In Case of Emergency' (ICE) licenses for their remote access platforms, allowing companies to temporarily support a huge increase in the number of remote access users on-demand. It's worth having some plans to handle extreme peaks in load (is that why the San Diego County emergency web site was effectively inaccessible for more than a day during the fires? oops), such as off-loading traffic to a hosting service, recovery site or alternate office - get the capacity planning wrong and you'll have a BC event of your own making.
Another twist came on Tuesday when the local power company declared an emergency due to loss of most long-distance transmission lines supplying the County - the sudden additional threat of blackouts cued the 'loss of services' plans and we reviewed our capability for sustained limited operations using our emergency generator, also quickly creating a plan that would minimize power use in the datacenter to preserve fuel just in case. Handling both loss of access *and* loss of a utility at the same time would have added some challenges (one thing we can't tell remotely is how much fuel we have left!), but we adapted our plans on-the-fly as the nature of the emergency changed.
There were a few other technical stumbling blocks - a sudden need for remote PBX reprogramming, for example - but by far the biggest challenge was 'loss of people' (figuratively) - with so many of the team in the process of evacuating we simply didn't have all the expertise we needed available, all of the time (while one admin packed their house their backup was already evacuated, and the backup's backup was trying to get back in to the region to their threatened home) - despite this our CEO's public statement about the fires and press releases were posted to our web site from an evacuation center, network issues were fixed from the freeway and helpdesk calls fielded within sight of the advancing fires. Technology and advanced planning such as widely deployed instant messaging, a collaborative intranet, conference bridges, email-integrated PDAs, a designated team for managing emergency responses and a dedicated employee call-in line all helped us keep track of staff, off-load work to people that could work remotely, minimize cell phone usage (at the request of the County), quickly share team and company updates, and sustain critical functions such as customer support.
The sky didn't fall (apparently - hard to see with all the smoke), the ground didn't shake (this time), and our business continued to operate despite the regional disaster both because of our BC plans and the extraordinary commitment and dedication of Overland's employees on- and off-site. Thanks folks!
--
James Deveson
Director Information Technology