GoCloud staff have worked through 2 large earthquakes in New Zealand, so we know how important it is to have a plan and make sure everyone knows what to do.
During the Christchurch earthquake we were flown in to a large business to recover IT Systems. - All of the billing was done from computer systems within the office.
They had a list of 100 Applications that were Priority 1 to get back online… but no ordering of importance past that.
The first step was to get as much data out of the area as possible, being a utility company (Electricity Generator and Retailer) we thought we could get a priority on network bandwidth. - unfortunately this had all gone to the emergency services.
Adding to the pressure the diesel generator was rapidly running out of fuel and tbh, it was lucky it even worked as it hadn't been tested in ages! We pulled Virtual machines and disks across the network to Wellington, the site was destroyed and very dangerous, the ceilings had collapsed and the lighting cables were still very much live.
We tried to get the backup tapes sent to Wellington, but the offsite storage company was uncontactable.
Once we got back to Wellington, we started to reprioritise the list of systems and over about 3 months managed to get them all back online.
Our client had a call centre, which was no also offline. Customers had no power… So it was a priority for us to get that back up and running ASAP. They also had a contract with a Telecommunications company to use a space in the event of a disaster... but they'd sold that several times and there wasn't any space left...
We redirected all the lines to Wellington and flew the operators in... To be honest I think they were glad to be out of Christchurch!
Here's our top ten tips...
- have a Business Continuity Plan (BCP)
- test it
- have a Disaster recovery plan
- test it
- know where your systems are and dig a little deeper than prioritising the restores
- ask your suppliers how they will operate - they'll need a plan to support you
- don't keep all your systems in one place
- have multiple communication channels, mobile networks were the first to go offline
- where possible use Public cloud services like AWS and Azure for key systems as availability is often built in
- plan for staff availability, family first!