What Is Recovery Testing In Software Testing

This tutorial explains what is Recovery Testing, its lifecycle, disaster recovery best practices and differences between Recovery testing and Reliability testing:

Software failures are unavoidable, some failures do not let the complete system down, but some failures can be a disaster. To reduce the impact of the disaster, “Recovery testing” comes in.

Let’s go through recovery testing in detail to understand how it helps to minimize the impact of any failure.

Recovery Testing

What Is Recovery Testing

Recovery testing is non -functional testing that determines the capability of the software to recover from failures such as software/hardware crashes or any network failures.

To perform recovery testing software/hardware is forcefully failed to verify

  • If recovery is successful or not.
  • Whether the further operations of the software can be performed or not.
  • The duration it will take to resume the operations.
  • Lost data can be recovered completely or not.
  • Percentage of scenarios in which the system can recover back.

Before this testing is performed, backup is taken and saved to a secured location to avoid any data loss in case data is not recovered back successfully.

Common failures that should be tested for recovery:

  1. Network issue
  2. Power failure
  3. External server not reachable
  4. Server not responding
  5. dll file missing
  6. Database overload
  7. Stopped services
  8. Physical conditions
  9. External device not responding
  10. Wireless network signal loss

Life Cycle Of Recovery Testing

The life cycle includes:

Life Cycle of Recovery testing

#1) Standard Operations

Standard operations of the system are the way system is intended to work. It is a system set up with all the hardware/software required so that system can run as expected.

#2) Disaster and failure Occurrence

Failure or disaster of the system can occur because of various reasons such as physical conditions, power failure, server not reachable, hardware failure, and many more.

#3) Interruption to Standard Process

When interruption to standard processes occurs, it can lead to losses in terms of business, relations with the client, monetary, reputation in the market, etc.

#4) Recovery Process

To avoid major losses companies, have backup plans so that there is minimal impact on the system because of interruption.

#5) Rebuild Process

The rebuild process includes already defined documents and processes which have to be followed. All the folders and configuration files are rebuilt to get the lost data.

Example for Recovery Testing

  • While downloading data on your system, turn off the Wifi connection and after some time turn it on again and observe whether the data continues to download, or data gets lost.
  • Let the browser work on more than one session and restart the system. Once the system is restarted verify if all the sessions got reloaded again.
  • When the application is receiving data from the network, to fail the scenario, unplug the cable. After some time plug in the cable again and observe whether data is recovered, and the application continues to receive the data from where it lost the connection.

Steps For Recovery Plan

  • Proper analysis should be done to verify the possibility of recovery. Failures that can occur, solutions to the failures, the impact of the failures, how to run the failures should be analyzed. The system’s ability to allocate extra resources such as CPU and server in case of critical failures should be analyzed.
  • Test Plan–Test cases should be designed as per the analysis results(mentioned in the above point).
  • Test environment should be built based on the results obtained from the analysis done for recovery.
  • Back up of the data should be maintained without fail, such as software states, database data, etc. Depending upon the criticality, data can be backed up with the below strategies:
    • Single back up/Multiple back-ups
    • Online/Offline backups
    • Multiple backups at one or multiple locations.
    • Automatic set up for back up at every “n” minute, say 15 mins.
    • To have a separate team to perform and track the backups.
  • Allocation of resources for recovery testing.
  • Recovery plan to be documented and to update the document as and when changes are being done.

Disaster Recovery Testing Best Practices

  • To start this testing the very first step is to have the test environment ready, which should be a replica of the production/live environment. Interface, hardware, software, code, firmware should be a complete replica of the live system. Quality results can be obtained if the test environment setup is much close to the live/production environment.
  • Hardware that is allocated for the production environment for restoring should be used while performing recovery testing.
  • Testers can use an online backup system for testing, but at the same time need to ensure that data gets retrieved easily and does not have security issues.

Advantages/Disadvantages

Advantages:

  • It helps to make the system more stable and bug-free and improves the quality of the product.
  • The system becomes more reliable as bugs are cleared before they go-live and improve the performance of the system.
  • Back up is always maintained to recover data in case of any failure.

Disadvantages:

  • A trained resource is required to perform this testing. Tester performing the same should have all the data for testing, i.e. data and backup files.
  • Recovery testing requires several steps to be performed before the testing and many steps while performing, which makes it a time-consuming process.
  • Recovery testing is an expensive process.
  • Not all the potential bugs can be found in a few cases.

Difference Between Recovery Testing And Reliability Testing

Recovery testing and reliability testing are often confused and considered as same. Whereas both are related to each other but are different. Let’s check the difference between both in the table below:

S.No.Recovery TestingReliability Testing
1Recovery testing is done to verify how well the system recovers after failure or disasterReliability testing is done to find the failure at a specific point where it occurs.
2Finds out if the system is able to continue operations after the disaster.Failures are found and fixed before the deployment.
3Recovery testing determines its capability to recover back the data from power failures, network issues, etc. The application is tested for a specific period of time and the environment. If the test results are consistently the same then only it is considered as a reliable application. 

Template For Disaster Recovery Testing

A template i.e. a pre-formatted document is used to plan the recovery from any disaster. Companies can have templates as per their requirement and as per their need. But few elements are mandatory to be part of it.

Let’s check out those elements that must be part of the template:

  1. Definition of Disaster, i.e. situation/condition when it will be considered a disaster.
  2. List of emergency response team with their complete details such as Name/Role/Email/Phone Number
  3. Disaster Recovery team details
  4. External Contact list: A list of resources that might be required at the time of disaster recovery.
  5. Risk Management: To cover the potential risks and the solution documented.
  6. Plan Overview
  7. Emergency Alert, escalation, and activation: Steps to be taken during the emergency.
  8. Insurance Information
  9. Financial and Legal Information
  10. Recovery Plan/ Back up Strategy

Frequently Asked Questions

Q #1) How do you perform a recovery test?

Answer: Enlisted below are a few examples to understand how recovery testing is done:

  • Restart the system when the browser has multiple sessions running. Once the system gets restarted verify whether the data of the browsers get uploaded or not.
  • Unplug the cable for the application which is receiving data and check whether the application receives data or not once the cable is again plugged in.
  • Restart the system when the application is running and later on verify whether data is intact or lost.

Q #2) What is disaster recovery testing in software testing?

Answer: Disaster recovery testing is the testing performed to ensure that no data is lost if in case any failure/disaster occurs. Companies perform this testing so that they can restore their data in case of actual failures.

Q #3) Why is disaster recovery testing important?

Answer: Disaster recovery testing is important because it ensures that after the interruption system works fine and all the data is recovered, all the applications are restored. This testing is very important for the continuity of the system without any loss.

Q #4) Is the recovery testing part of Performance testing?

Answer: Yes, this testing falls under performance testing. It is also done with load testing. Recovery testing is done to know how well the system will recover in case of any failure or disaster.

Conclusion

Failures can occur anytime because of many inevitable reasons; recovery testing eliminates critical bugs. It makes the system ready to recover from those failures. The frequency of performing recovery testing is inversely proportional to the impact of failure on the system. Hence, frequent testing plays an important role to minimize the impact.

This approach of testing verifies that recovery is done successfully in case of failures.