Sorry for the bit of delay on this post, but a hurricane got in the way. Being based in New Jersey, I got my first experience of living through one of the largest storms ever to make land fall in the New York/New Jersey area. From the experience, I can assure you, it wasn’t an enjoyable one.
While sitting in my home, in the cold and dark for the 4th consecutive day I pondered how resilient telecommunications facilities need to be? We have four major network points of presence and data centres in the New Jersey / New York area, and I’m very proud to say they held up exceptionally well during the hurricane and afterwards. We suffered one 2-3 hour outage at one of our locations due to multiple failures within the site after commercial power was lost. The other three sites remained in service the entire time. Given the overall destruction the storm unleashed, that was a pretty good result. I wished my wireless network provider could make the same claim. Not only was there an extended period of time in the cold and dark without power, even basic voice and text services were spotty at best and completely unavailable for days after the event.
That leads me to question how much redundancy is necessary. Having redundant network POPs in the same large metro area is generally required, but is having redundancy within each of the POPs / data centres necessary as well, and if so, how much? I used to really question that design practice – not only having multiple pops but N+1 redundancy within the POP / data center as well. I can’t tell you how many times I’ve toured our network sites, and looked at N+1 generators, UPS configurations and rooms full of batteries that were never in use. Well after this experience, I can say that adhering to that design practice and spending the money on that level of redundancy was worth every penny these past two weeks.
The second question that I got to see in action first hand during this super storm was how good are your company’s operations procedures and operations staff when an event like this happens? You can only plan for so many scenarios. And once the situation is beyond one of the planned scenarios, the results – good or bad – are generally dictated by how good the operations staff is at adapting to the challenge at hand. The one location where we suffered an outage was just one of those scenarios. Commercial power was out, and then within the POP we suffered two more failures. Here is a situation that we never planned for. However, due to the creativity of the operations staff we had on site, they immediately acted to address these unforeseen failures. And that action, really limited the down time significantly.
Our on-site teams, regional managers and colleagues didn’t hesitate to step up and take immediate ownership of the situation. The teams went above and beyond the call of duty by working around the clock to ensure the impact on our business and customers around the world was minimised.
So how much redundancy do you design for? Can it withstand the 100 year storm? And how well does the operations team respond to scenarios that aren’t planned for? Spend some time answering these questions for your business before the next event of this scale. It can make the difference between being the operator that failed to deliver and the operator that has proven resilience.
Our thoughts are with all those that suffered through this terrible storm.