The True Cost of Data Center Downtime
9 min read
Outages and downtime affect every business and can be extremely costly. Choosing the right data center provider and facility can minimize outage risk and keep mission critical processes running.
IT infrastructure efficiency and reliability are indispensable to the success of any business. Any aberrant pattern in those assets can trigger a chain of unprecedented events resulting in costly–and sometimes irreparable– damages. For example, in May 2017, over 75,000 people had their three-day weekend plans stalled when British Airways suffered a massive data center failure. Along with the chaos of lost luggage, broken trust and frustrations resulting from canceled and delayed flights, the U.K’s largest airline also had to deal with a loss over $75 million.
The outage was caused by a single engineer who disconnected and reconnected the power supply in a disorganized fashion, triggering a power surge that disrupted the operations of the entire BA infrastructure.
Outages like these are not anomalies. The rate and severity of these incidents have grown significantly over the last year (2018), according to Uptime Institutes’ Global Data Center Survey. What’s surprising is that 80% of the survey respondents believed that their recent outage was preventable.
Data Center Outages: The Great Equalizer
Even the data center behemoths aren’t immune to outages– including Amazon Web Services (AWS) which is considered to be the world’s biggest cloud provider, and home to some of the biggest names on the Internet, like Slack, Quora, Netflix, and Airbnb. In 2017, a mistyped command entered by an AWS engineer caused many sites to shut down for several hours, prompting a loss of over $150 million.
A similar incident echoed a year later at one of AWS’ biggest rival. On September 2018, a lightning strike caused Microsoft Azure to suffer an outage from voltage surge resulting in damage to hardware, including network devices and storage units.
The Aftermath of A Data Center Outage
The impact of an outage varies between multinational corporations and small businesses. While the scale and severity of the damages differ, all victims suffer its impact on some facet of their business. Why? Because data centers have become the critical backbone necessary to make mission-critical applications of enterprises work. Any interruption to the uptime holds the potential to completely paralyze the core operations of the business.
Here are 3 common impacts of a data center outage:
- An expensive disaster and recovery – The average cost of a data center outage was $740,357 in 2016, according to Ponemon Institute Research report. Although that amount looks staggering, it may not mean much to you personally. But, research has also shown that 93% of businesses who suffered an outage for more than 10 days filed for bankruptcy within a year of the event. Along with the massive loss of revenue during the outage, the repair costs can be outrageous. While big corporations may have enough resources to find ground again, smaller companies could get completely wiped out from just one such outage.
- Decreased productivity – Businesses today heavily rely on online applications, project management platforms, and software to perform the day-to-day task. During an outage, your employees will not be able to do anything, except sit around and wait for their servers to come back on. From operating costs to paying employees overtime to fix the damage done, the expenses can add up significantly.
- Damaged reputation – This intangible cost of an outage often gets overlooked. For popular brands, customers may be likely to forgive one outage, but if the incidents increase then you might lose credibility in their eyes. Smaller brands and companies cannot afford to pay the price of a damaged reputation.
Common Causes of Data Center Outages
The IT infrastructure is getting increasingly complex with its heavy adoption of cloud and Hybrid IT systems, and challenges to data centers are becoming more sophisticated. But, history is a great teacher, and past outages show us a few common culprits that are responsible for the majority of such disasters.
Here are 6 of them:
- Human error – Making mistakes is part of being human, and one wrong action can initiate a surprise outage. Uptime Institute states that 70% of data center failure can be attributed to humans. Simple mistakes like misadjusting temperature, removing power plug, or a mistyped command are enough to trigger a snowball of catastrophe within the four walls of a data center.
- Network failure – Data center networks host e-commerce stores, video streaming sites, data analytics, and many applications that depend on reliable network connectivity. Several factors can cause network failure including cyber crimes, equipment and software crashes, and human error. Businesses handling network connectivity in-house are more prone to experience longer downtime because of the lack of redundant network set-up (commonly found in data center colocation providers).
- Power outages – Power is the oxygen of a data center without which it cannot function. The Global Data Center Survey report by Uptime Institute shows that while data centers have become better at efficiently managing power, the rate of power outages has increased. Some causes of power outages in the data center include power grid failure, poorly designed power distribution units and transformer-blowouts.
- UPS System Failure – Uninterruptible power supply (UPS) is designed to keep the data center running even if the main power supply source were to malfunction. Yet, UPS failures are a leading cause (25%) of data center downtime among both cloud data centers and colocation facilities. Battery failure or damaged fan, sensor, capacitor and circuit boards can all contribute to a UPS system failure.
- Natural disasters – Some data centers can withstand a massive earthquake, while others get disrupted from just slight flooding. Natural disasters are no longer black swan events that happen once in a century. According to an Uptime Institute report, climate change is going to aggravate the impact businesses stand to face from a natural disaster. For example, Hurricane Sandy caused an extensive data center outage all across New York and New Jersey.
- Cyber crimes – Cyber attacks, specifically Distributed Denial of Service (DDoS) attacks, were responsible for 22% of unplanned outages in 2015, according to a survey conducted by Ponemon Institute. A DDoS attack is a type of cyber crime that involves several systems (botnets) trying to penetrate a server or network to disrupt its operation. The rate of attacks is expected to rise in the coming years because of the high prevalence of Ready-to-go DDoS malware kits and DDoS-as-a-Service providers.
The Importance of Power and Connectivity Redundancy
Redundancy is the duplication of processes or components in order to increase its reliability in an event of an accident. Businesses need to ensure that their IT infrastructure never goes down if some components stop working. Redundancy brings IT infrastructure closer to the goal of zero downtime.
As mentioned above, two critical causes of data center outages are network connectivity and power failure.
Making sure that your data center has strong network redundancy systems in place is critical to facilitate uninterrupted business continuity. To maintain a high level of availability, or to meet the famous ‘five nines’ (99.999%) as target uptime, your network architecture needs strong backups. Having a secondary or tertiary method of access makes sure that your business is up and running all the time, even through a network disaster.
Power redundancy is another critical area that shouldn’t be ignored because power is indispensable for the operability of the data center. Having a reliable source of redundant power through backup generators, UPS and PDU will see to it that the data center always has enough power to completely support the data center even if the main power supply fails.
Find Data Centers With the Right Reliability for Your Needs
Reliability, when it comes to a data center, denotes the ability of systems or components to carry out its intended functions for a specific period of time. This is different than availability, which refers to how accessible systems and components are when needed. That said, they are both important and related.
Reliability, as a metric though, highlights the efficiency of a data center in relation to the needs of a business. It is frequently assessed by the globally renowned 4-level ranking system created by Uptime Institute, known as the Tier Classification System.
Each data center tier has its own qualification criteria in regards to power, cooling, maintenance and fault tolerance. Tier I and Tier II are meant for organizations who don’t fully depend on real-time delivery of products for its revenue. Tier III and IV have a higher uptime availability and are meant for organizations for whom every lost minute might mean a loss of thousands to millions in dollars.
Downtime is Inevitable. Preparedness is Key
So what’s the true cost of a data center outage? Loss of dollars, yes, but also a loss of customers, reputation, productivity, peace, and, in some cases, the business itself. Unplanned outages can happen to everyone, but the number of outages and the severity of its impact can be mitigated by choosing the right facilities and providers that have intelligent infrastructures and redundancy options in place that’s resilient enough to weather storms, power loss, and bot attacks. If the case of an outage is inevitable, then choosing a dependable data center partner will help delay the possibility of being hit by such an event.
You can find data centers with the reliability and resiliency for your needs using UpStack’s marketplace. If you need help assessing the level of reliability your applications and services need from colocation, you can contact an UpStack advisor who will craft a custom solution for your business.