Disaster Recovery is Broken but Don't Blame IT

May 19, 2016 James Kessinger

We completed a survey of 343 IT execs responsible for disaster recovery (DR) more than a year ago, and the findings weren't encouraging. With recent widespread outages announced by some of the world's most popular brands, the results are more relevant than ever.  

The survey showed serious gaps in DR protection at most firms.  The leading causes of the gaps: 1) limited internal resources; and 2) process complexity.

Read the Entire Report 

One thing that is clear from the results is that when it comes to disaster recovery most firms have ignored best practices in favor of a patchwork of compromises that ultimately mean more work for IT departments at a higher expense and risk for business continuity exposure for most organizations.

Before You Blame IT- Read on

Why would highly-trained IT pros accept costly compromise that create work and increase risks and expense?  In short: because disaster recovery as it has been practiced at most firms is broken. It is only effective for firms willing to make sizable investments in people and processes.  For most firms, it is management by checkbox and denial, which leads to a complex and hard to maintain process.

The good news is that with new hyper-scale cloud DR operating models, including pilot light DR, firms can leverage automation to reduce costs and complexity, enhance protection and improve recovery objectives.  More about that later. 

Let’s consider the core of the problem; that fact that a lack of internal resources and process complexity are causing firms to avoid testing their DR environments, even after considerable capex and opex investments are made in secondary data centers.  That is probably why one CIO recently called secondary data centers a terrible waste.

Infrequent Testing is Degrading DR Protection

Infrequent testing is the core of the DR problem.  When asked about how frequently they tested their DR environment, more than half of the respondents indicated that they tested less than once a year; even worse, a third said that they tested infrequently or never.

Given the accelerating rate of change in most app environments, quarterly testing is seen as an acceptable practice for most applications.  Yet only 42% of those surveyed tested quarterly or monthly.

When IT pros were asked why they did not adequately test their DR environments (and accept unnecessary gaps in protection), respondents cited several challenges, including inadequate internal resources, DR process complexity and the lack of prioritization.


Infrequent Testing: Reported by survey respondents July 2015. 33% Infrequent/never. 26% Infrequent/Never. 25% Annually. 16% Monthly.

Software Integrated with Cloud APIs Enable Push Button DR Tests

Pilot light DR in the cloud promises to make regular DR testing much easier than with traditional approaches. By automating critical tasks, including launching tests with the push of a button,  existing IT teams can easily adopt quarterly (or more frequent) testing schedules without having to schedule resources ahead of time from a non-scalable service provider.

Using storage volumes in the cloud to store a synchronized, updated copy of an entire app environment, including network and security settings, configs, patches and data, means that costs can be kept low compared to a traditional secondary data center and production instances can be launched at the push of a button. It’s often called “pay as you go” DR.

Why DR Testing is Infrequent, reported by Survey Respondents July 2015. 37%: Inadequate internal resources. 23%: Process complexity. 19%: Not a priority. 12%: Internal costs too high. 5%: DR vendor long lead time. 4%: DR vendor has high fees."

Recovery Point Objectives

Our survey also found that recovery point objectives were not as troublesome as test frequency. Although about 33% or respondent did report RPOs greater than 12 hours, 58% had RPOs less than 12 hours.  22% even reported RPOs of less than 2 hours.


More Good News

Using the cloud as a secondary data center can also reduce RPOs from 12+ hours to one hour or less, depending upon the frequency of changes to the app environment and the amount of available bandwidth.  This dramatically improves agility and assures that an organization can meet or exceed DR objectives with existing internal resources.

Similar reductions can be attained for Recovery Time Objectives (RTOs) as well.  In addition to reducing costs and increasing agility, the cloud can reduce RPOs and RTOs for many organizations.

Looking Forward: Automation is Key

The first generation of cloud DR and migration tools worked well for app environments confined to a particular virtualization platform.  Then a second generation of tools emerged that were virtualization platform agnostic, yet incurred high costs and risks for environments with physical servers in the mix.

Both generations, however, fell short when it came to automating cloud deployment (for migration or disaster recovery) for environments with physical workloads.  As they entered the market they convinced IT that the promise of automation was limited to environments that were already 100% virtualized.

I talked about these solutions with InformationWeek a few weeks ago at Interop, where CloudVelox was a Best of Interop Finalist in Cloud.  We were architected from the beginning for complex app environments with physical and virtual workloads and extensive network and security controls.

Don’t Blame IT: Survey Results Break IT “Anti-Cloud” Stereotype

When asked if the ability to automatically extend network and security controls into the cloud would impact their willingness to leverage the cloud as a secondary data center 55% said yes. That is contrary to the stereotype of IT being resistant to the cloud.  Hardly, it shows a substantial willingness to adopt the cloud for DR if network and security controls can be automated.

State of DR Survey Infographic Now Available!

About the Survey


Participants were invited by email to complete a series of questions regarding their disaster recovery practices. In exchange they were offered a gift card and entered in a drawing for an Apple Watch.


About 31% of respondents were responsible for business units with more than 1000 employees, while almost 56% represented units with between 100 and 1000 employees.

9 industries were represented in the survey, with consumer goods and services (22%), financial services (20%) and health care (16%) representing the largest number of participants.

Further reading: The Cloud will Crush Traditional DR

Find out about Pilot Light DR from CloudVelox


Previous Article
Clouds - August 2016
Clouds - August 2016

InfoWorld on the End of the Private Cloud Fantasy; TechTarget on AWS "All Ins"

Next Article
State of Disaster Recovery Survey: Infographic
State of Disaster Recovery Survey: Infographic

The State of Disaster Recovery: Infographic

Read the Pilot Light DR Datasheet