A guide to disaster recovery testing
Disaster recovery testing is an important element of an organization's business continuity and disaster recovery plan.
A disaster recovery (DR) plan is the process of implementing detailed testing to ensure that a business can recover all data, restore business critical applications and continue operations in the event of a serious interruption to operations.
In this comprehensive guide, we'll look at how different types of disasters can affect businesses and the various aspects of DR testing, including recovery time, the testing process itself, DR best practices, the resources needed to implement recovery plans and more. We'll include a step-by-step plan to help review management objectives and enable seamless recovery processes.
Defining a disaster
It can be difficult to categorize a disastrous event, but broadly speaking, disasters are either man-made or natural. So we could divide disasters loosely into even more sub-categories, allowing for some to present as 'hybrid'.
Wildfires, floods, hurricanes, mudslides, tornados and earthquakes are classed as natural disasters. Any disaster scenario that involves the elements can throw a business’s carefully laid plans and projections into serious disarray. Such catastrophic events can break supply chains, prevent employees from getting to work, and cause damage or destruction to vital facilities or equipment. That’s why disaster recovery planning should be just as high of a priority for organizations as having a proactive future growth plan.
COVID-19 (and other pandemics)
We've put this in a category of it's own because while a pandemic classifies as a 'natural disaster', it crosses over into the category of man-made, and/or biological. While COVID hasn't caused physical damage to facilities or equipment, it has certainly prevented employees from getting to work, broken supply chains and destroyed the operational structure of business environments as we'd come to know them. If there's one thing that businesses have learned from this pandemic, it's that disaster recovery planning enables business continuity.
There are many types of disasters that can affect a business's operations. For example, when an organization loses a significant manager, head of department or director. Businesses can also be affected by operational disasters when contracts are broken or a when a business environment becomes severely unfavorable for operating.
These types of disasters also have a profound effect on business operations. For example an event caused by malfunctioning technology, like network connection issues, data loss, server problems or security breaches. They generally involve some human error, so they could be classed as man-made, meaning that there is an 'identifiable cause'.
Disaster recovery testing is designed to help a business stay ahead of problems that could result in a loss of data in the future. According to the National Archives & Records Administration in Washington, 93 percent of companies that lose access to their data for 10 days or more due to a disaster, file for bankruptcy within a year.
How disasters could affect a business
Depending on specific circumstances, here are some examples of how the above types of disasters could significantly derail business continuity. Later, we'll look at how disaster recovery planning and recovery testing can safeguard against such events, reduce recovery time in the future and help restore business continuity. Every business is different, however, and the disaster recovery plan that works for one organization may be entirely unsuitable for another.
- Natural disasters - for example, fire, or flooding caused by heavy rain, or wind damage following storms. Disaster recovery testing for natural disasters involves the instigation of more specific emergency procedures , including evacuation processes.
- Theft or sabotage - theft of computer equipment, or infiltration of IT security could result in loss of data and critical files, as well as potentially holding a business to ransom. System backup on a regular basis is an important part of a DR plan
- Power cuts- loss of power could have serious consequences including prolonged downtime, affecting the ability to work effectively. Even a short period of downtime can result in a huge impact on a business's bottom line. A solid DR plan will provide backup in the event of power failure.
- IT network failure - With many organizations heavily relying on technology for the collaboration and communication needs, a network failure can disrupt important meetings and potentially result in the loss of clients or customers. Disaster recovery is an intrinsic part of every IT infrastructure.
- Loss or illness of key staff - if any of your staff is central to the running of your business, consider what would happen if they were to leave or be incapacitated by illness. A disaster recovery plan could include additional personnel training as backup.
- Outbreak of disease or infection - almost every business worldwide has recently experienced disaster recovery measures while dealing with the effects of the outbreak of an infectious disease. Disaster recovery testing in this case is ongoing, ensuring that in case of future incidents like this, a business is well prepared.
- Crises affecting the reputation of business - disaster recovery is an important consideration for wholesale and retail businesses in the event of a crisis like a product recall. A disaster like this could severely damage company reputation and potentially have a crippling effect financially.
Goals of disaster recovery testing
One of the main goals of disaster recovery testing is to find out if a DR plan can work, and meet an organization's predetermined Recovery Point Objective (RPO) and Recovery Time Objective (RTO) requirements. Recovery testing also provides feedback to enterprises so they can amend their DR plan should any unexpected issues arise.in case of any unexpected issues
IT systems are rarely static in nature, so each time an organization adds a new element or installs an upgrade to the system, those additions need to be tested again. For example, storage systems and servers may have been added or upgraded, new applications deployed and older applications updated since an organization developed its original disaster recovery plan.
With more and more organizations migrating to the cloud, this is playing a larger role in an organization's IT infrastructure. A disaster recovery test helps to make sure a DR plan stays current in an IT world that changes constantly.
Does Your Company Do Disaster Recovery Testing?
Getting started with disaster recovery testing
Disaster recovery testing, as we've already mentioned, is different for every business. However, there are some basic steps that need to be taken before the actual process of testing begins.
Step 1: Perform an audit of IT resources
Before business continuity and normality can resume after a disaster, businesses need to know what 'normal' actually is. This involves identifying all the disparate assets that exist on the business network infrastructure. By creating an inventory of all of the IT resources on the network, and identifying what they contain, a business can start the process of consolidation, making it easier and more streamlined for the backup and recovery process in the future.
Step 2: Decide what is mission critical
During the audit of assets, businesses may find that a great deal of data is actually redundant, or not necessary to keep the system running. Transferring every piece of unnecessary data in the network to a backup server could use a huge amount of processing power. Sorting redundant data can help reduce the size of a backup file, saving storage space and expense.
Step 3: Create specific roles and responsibilities for all involved in the DR plan
Every employee in an organization should have a role to play in an effective disaster recovery plan. While automated disaster recovery testing serves an important purpose in a DR plan, it only tests the technical components. If a real disaster occurs, it's the people within an organization who will need to know what to do to rapidly restore uptime.
When everyone knows what to do in response to an emergency, your DR plan will be more effective than it would be if nobody knew what to do when a disaster occurs.
Step 4: Determine your recovery goals
Decide how quickly your organization needs to recover, and set your RTOs and RPOs. This could involve prioritizing which data needs to be accessed immediately, and which is less important. Data that doesn't require immediate access could be assigned a longer recover time and less frequent backups. While important data, like financials and compliance could be assigned more urgent RPOs and RTOs or even a backup server to take over for the main server in disaster recovery process.
Step 5: Implement a cloud data storage solution
Disasters like cyber attacks and ransomware attacks could destroy an organization's primary data storage solution, resulting in the permanent loss of that data. Cloud-based solutions can automatically download and copy data every few days (or even every few hours). Unlike older, manual backup methods requiring users to copy data to a disk or USB drive, backups via a cloud-based solution can be carried out at any time, and without having to access physical media.
Another example is if physical assets storing your data are damaged, by fire, flood, or human tampering, remote data backup can help minimize business disruptions.
Disaster recovery plan review
Here, the DR plan owner and other members of the team behind its development and implementation closely review the plan, to find any inconsistencies or missing elements.
Much like a first rehearsal, stakeholders walk step by step through all the components of a disaster recovery plan. This helps determine if everyone knows what they are supposed to do in case of an emergency and uncovers any inconsistencies, missing information or errors.
Simulating disaster scenarios is a good way to see if the disaster recovery procedures and resources, including backup systems and recovery sites allocated for disaster recovery and business continuity work. A simulation involves running a variety of disaster scenarios to see if the teams involved in the DR process can restart technologies and business operations quickly and effectively. This process can determine if there is sufficient staff to get the DR plan executed properly.
Image source: TechTarget https://cdn.ttgtmedia.com/rms/onlineimages/disaster_recovery-bcdr_planning_scenarios_desktop.png
With your DR plan in place, and team members ready for any scenario, disaster recovery testing can go ahead. But a disaster recovery plan is only as good as its weakest link, so organizations should commit to regular disaster recovery testing. The frequency of testing depends on the business but this is another element of a DR plan that needs to be determined ahead of time.
Why monitoring and performance management should be part of every DR plan
Having third party monitoring tools in place can actually help to avoid certain disasters or at least reduce their severity. Monitoring can pick up anomalies within a system, and identify potential issues. IR's Collaborate suite of performance solutions provide the insights an organization needs to make proactive business decisions, formulate effective DR plans, and create an efficient production environment.
- Comprehensive monitoring, surveillance, alerting, and reporting helps you meet and manage your SLAs by ensuring your systems and applications are running at peak performance.
- Gather real time intelligence across a wide range of data points and criteria
- Customizable dashboards provide deep visibility that can help identify problems in real time. This allows you to take immediate action to solve issues before they impact the broader business.
Disaster recovery testing checklist
- Clearly identify goals, objectives, procedures to create a post-testing analysis. Create a test team, including subject matter experts and make sure everyone is available for the planned testing date
- Determine exactly what to test
- Carefully document and be prepared to edit your DR plan and disaster recovery testing scripts
- Include all relevant technology elements and processes being tested in the plan
- Ensure the test environment is ready, and won't affect production systems or conflict with other activities
- If testing is going to take a significant amount of time, schedule it far in advance
- Perform a practice exercise before the disaster recovery test goes live to uncover and fix potential problems
- Stop and review the test when issues arise and reschedule if necessary
- Keep comprehensive records of start and end times, what occurred, what worked and what didn't.
- Update disaster recovery and business continuity plans and other documents based on what's been learned from the DR test.