What is Test Data Management (TDM): Strategy with Example

This tutorial will explain the Test Data Management Concept, Process, and Strategy that are crucial for effective software testing.

In the last tutorial, we focused on how to prepare a Test Bed to minimize Test Environment defects. In continuation with the same tutorial, today we will learn how to set up and maintain a Test Environment and important Test Data Management techniques.

Test Environment setup process

The most important factor for the test environment is to replicate it as close to the end-user environment as possible. End users do not need to handle any configuration or installations themselves, as a fully functional product or system is provided to them. Hence, by that definition, even the test teams need not explicitly perform such configurations.

Table of Contents:

Test Data Management
Conclusion

Test Data Management

If any configurations are necessary purely for testing (but will later be configured for end-users), administrators must be identified. Those administrators who configure the development environment must be the same people who configure the test environment.

If the development team itself takes the initiative in installation/configuration, then they must help to do the same, even in the test environment.

For example, if you have to test an application (with its associated middleware to be installed and configured) on a system across various OS platforms, etc., the best way to address this is to use Virtualization or Cloud environments.

Have a master system wherein all the applications and needed middleware are correctly installed and configured. Then make this system a master image by capturing it and cloning several instances from this same image such that each user feels like he has a dedicated system with the application under test.

Here below is a pictorial depiction of what a Test Environment process would entail:

Test Environment Setup Process

Maintenance of a Test Environment

So much is said about the test environment preparation, albeit the challenges. This is doubtless more than ground to necessitate the maintenance or standardize the test environment. A lot of times, a tester loses testing time because of the environment or setup issues.

With a rapid increase in the operating systems and the range of hardware and software, the environment has to be almost dynamic, to cope with the needs. Test teams can ensure that they are delivering a high-quality product with a good test management process and this would help in having optimal usage of limitedly available resources.

Key Pointers To Ensure Effective Maintenance Of Test Environment

As test environments, most times contain heterogeneous platforms and stacks, presented below are some key pointers to ensure effective maintenance of the test environment.

#1) Effective environment sharing and distribution

As already mentioned earlier, one of the key challenges of test environment preparation is that many teams or people need to use the same set of resources for their testing purposes. Hence, a suitable sharing mechanism needs to be developed that caters to the needs of all teams and people without delaying schedules.

By maintaining a repository or information link, it is possible to achieve this, wherein all the data regarding:

Who is using the environment?
When the environment is free to be used.
How the distribution of environment usage time is entered accurately.

By proactively determining where the resources are large versus the limited availability of them, a large amount of chaos gets automatically nullified.

The second aspect of this is to revisit the teams’ resource requirements for each testing cycle and look for which resources are not utilized very heavily. Analyze if we can replace those particular resources with any new resources or systems that may be needed.

#2) Sanity checks

Some test requirements need a comprehensive test setup or setup that involves elaborate steps that are extremely time-consuming. This is specifically the case during end-to-end testing, which involves two or more components working together. Hence, the same test environment may need to be re-used by multiple teams.

In such cases, having a good understanding of the entire environment and collating what kind of tests are being performed by various teams will paint a reasonable picture to help provide those specific resources to the respective teams.

By taking into account the factors mentioned above, conducting basic sanity testing can speed up individual team tests or notify them promptly if any changes or fixes are needed in the environment.

#3) Keeping track of any outages

Just like every team that owns a test environment has them, an organization has all the possible test environments maintained by a global support team.

Just like teams owning their test environment have their own local downtime in case of any firmware/software upgrades, the global teams also have to ensure that all the environments are adhering to the latest standards, which may involve either power or network outages.

Hence, those maintaining the test environment must monitor any such outages that may happen and inform the test team beforehand to plan their work accordingly.

#4) Virtualize wherever possible

This is again very relevant where testing needs to be done sharing the environment and there is a dire need for optimization of resources. In such times, using a virtualized environment such as a cloud for testing is the answer.

When using such an environment, all the testers need to do, is to provide an instant and this instance once provisioned, will form an independent Test Bed or Test Environment containing all the diverse resources such as a dedicated OS, database, middleware, automation frameworks, etc. required for the testing.

Once the testing is concluded, these instances can be destroyed, reducing costs for an organization. Cloud environments are particularly useful for functional verification testing and automation testing areas.

#5) Regression Testing/Automation

As and when new functions and features are being developed, regression tests need to be performed for these functions for every release cycle. Despite seeming static, the regression test environments are always evolving with each release to accommodate new features being implemented.

Every product release cycle would have one or more rounds of regression testing. Thus, establishing regression test environments for every product release cycle and re-using them within the cycle would portray the stability of the test environment.

Developing automation frameworks and using automation for regressive tests also helps in improving the efficiency of a test environment because automation will assume that the environment is stable and the defects that are originated are purely feature/code-oriented.

#6) General governance

When there are some issues with the test environment hardware or software, these issues must be directed to the right people to ensure fixes if cannot be fixed internally by those maintaining the lab.

For example, if any testing originates a defect that comprises a limitation in the firmware or the software that is being used in the current environment, this generally cannot be fixed solely by those responsible for environment maintenance.

Hence, the consumer (who is the tester in this case) must be asked to raise appropriate service requests. These must be directed to the appropriate vendor or team and coordination must be done regularly with them to ensure the next version has fixed the particular problem.

Another aspect of governance would be to provide detailed environment reports to the management or stakeholders from time to time, which helps in emanating transparency and forms a good ground for any analysis.

Test Data Preparation

Let’s now look at the latter portion of a Test Bed creation – which involves setting up the test data. With such a large chunk being said about the test environment, the true essence of the test environment, its robustness, and its efficiency can be measured with the test data. By definition, the test data is any kind of input given to the software code being tested.

Even though we spend a good amount of time designing test cases, the reason test data is important is that it ensures complete testing coverage for all kinds of scenarios, thereby improving the quality. There could be some test data that is needed for any happy or positive path testing.

Some other data could be designed for error or negative testing, which is very helpful in discovering how the application performs when put in abnormal situations.

Test data is generally created before the text execution begins because every test environment has its own set of complexities or preparing the data itself may be a long-drawn-out process. Generally, the test data sources include the internal development team and the end-users who use the code or feature.

For example, Function testing

Let’s take an example where you need to perform functional testing or black-box testing. Here, the objective is that the code has to functionally meet the requirements that are specified.

In such situations, test case preparation should typically involve data coverage of the following types:

Positive Path data: With the development use case document as the reference, this is the data in sync with performing the positive path scenarios.
Negative Path data: This is data that is considered “invalid” concerning the correct functional working of the code.
Null Data: Supplying no data when the application or code expects that data.
Erroneous Data: Determining the performance of the code when data is supplied in an illegal format.
Boundary Conditions Data: Test data that is supplied out of the index or array to determine how the code performs.

Test data play a key role in identifying where a product or feature can completely break. Always have a practice of polling and validating the kind of data fed to the test environment in different phases of testing.

Test Data Management

When test data plays such an important role in assuring the quality of the product, it’s reasonable to say that its management and streamlining also play an equally important role in Quality Assurance of any product that has to be released to the customers.

Need for Test Data management and best practices:

#1) Many organizations have ever-changing business goals to meet user needs, so it’s crucial to have the right test data to ensure quality testing. This will involve setting up the exact data for the respective test environments and monitoring the behavioral patterns.

As already discussed, a large chunk of a testing team’s time is expended in the planning of test data and its related tasks. Testing of functionality can be significantly impeded by the unavailability of appropriate test data, creating a critical challenge for complete coverage.

#2) Also, sometimes for certain testing requirements, test data needs to be constantly refreshed. This itself causes a lot of delay in the cycle because of constant re-work, which also increases the cost of the application reaching the market.

In certain other times, if the product being shipped has involvement with different workgroup units in a large organization, the creation and refreshing of test data necessitates an intricate level of coordination across these workgroups.

#3) Even though the test teams need to create all kinds of data that are possible to ensure adequate testing, organizations must also consider that doing this would mean that all the different data need to be stored in some kind of repository.

Although having a repository is good practice, storing excessive and unwanted data would not only significantly increase the storage space to store these large chunks of data but also make it increasingly challenging to fetch the appropriate data for the testing in question if there is no version maintenance and archiving of this repository.

Most of the organizations are faced with these common challenges concerning test data. Thus, there needs to be some management strategies that need to be put in place to minimize the degree of these challenges.

Here below are some suggested methodologies for the management of the test data and keeping it relevant to the testing needs. The following practices are very basic and generic, which will commonly work for most organizations. How it is adopted is purely the discretion of the respective organizations.

Test Data Management Strategies

#1) Analysis of data

Test data is constructed based on the test cases to be executed. For example, in a system testing team, the end to end test scenario needs to be identified based on which the test data is designed. This could involve one or more applications to work.

Say in a product that does workload management–it involves the management controller application, the middleware applications, and the database applications all to function in correlation with one another. The required test data for the same could be scattered. A thorough analysis of all the different kinds of data that may be required has to be made to ensure effective management.

#2) Data setup to mirror the production environment

This is an extension of the previous step and enables to understand what the end-user or production scenario will be and what data is required for the same. Use that data and compare that data with the data that currently exists in the current test environment. Based on this, new data may need to be created or modified.

#3) Determination of the Test Data clean-up

Based on the testing requirement in the current release cycle (where a release cycle can span over a long time), the test data may need to be altered or created, as stated in the above point. This test data, although not immediately relevant, may be required at a later point. Hence, a clear process of deeming when the test data can be cleaned up should be formulated.

#4) Identify sensitive data and protect it

Often, extensive amounts of highly sensitive data are necessary for thorough application testing. For example, a cloud-based test environment is a popular choice because it renders on-demand testing of different products.

However, something as basic as guaranteeing user privacy in a cloud is a cause of concern. So, especially where we will need to replicate the user environment, the mechanism to shield sensitive data must be identified. The test data volume largely governs the mechanism.

#5) Automation

Just as we adopt automation for running repetitive tests or for running the same tests with different data, it’s also possible to automate the creation of test data. This would help in exposing any errors that may occur regarding data during testing. One approach is to compare the results generated by consecutive test runs using a set of data. Now, automate comparing.

#6) Effective data refresh using a central repository

This is by far the most important methodology and forms the heart of implementing data management. All the points mentioned above, especially those with respect to data setup, data clean up, are directly or indirectly co-relate with this.

A lot of effort in creating test data can be saved by maintaining a central repository that contains all kinds of data that may be required for various kinds of testing. How is this done? In consecutive test cycles, for either a new test case or a modified test case, check if the data exists in the repository. If it does not exist, feed that data into the test environment first.

Next, this can be directed to this repository for future reference. Now, for consecutive release cycles, the test team can use all or a subset of this data. Isn’t the advantage very apparent?

Depending on the sets of data that are frequently used, obsolete data can be easily eliminated and hence ensure correct data is always present, reducing the cost to store that unneeded data.

Second, you can also have a couple of versions of this repository saved or can revise it as necessary. Having different versions of the repository can help greatly in regression testing to identify what change in data can cause the code to break.

Suggested Read =>> Write a Software Test Plan Document

Conclusion

The test environment should be of prime importance in every test team. Every release cycle will bring many new challenges to combat with an unreliable and unplanned test environment.

To revolutionize their operations, organizations are implementing strategies such as establishing dedicated Test Environment Maintenance teams to ensure smoother release cycles.

Improved testing is only an obvious effect of streamlining test data management. A key essence of it is that ensures a cost-effective solution for organizations while making no compromise on the reliability of the product.

Let us know how you manage your test environment and how you prepare test data? Would you like to contribute any tips?

Was this helpful?

Thanks for your feedback!

12 thoughts on “What is Test Data Management (TDM): Strategy with Example”

James Smith

December 9, 2024 at 1:39 am

Thanks for this article. I am looking forward to more such articles.Also, talking about Test data management reminds me of Enov8 a company providing the said services in the most optimal way!
Suganya

September 21, 2024 at 9:52 am

Is there any open source for “Test Data Management”
Syam

July 26, 2024 at 4:33 pm

how do we implement TDM when there are 3 DB’s(Oracle, DB2, SAP) and data masking is going to implement only on Oracle . what could be the potential impact and solution when the masked data is flowing into downstream systems as DB2 and SAP .
Carolyn

December 8, 2023 at 10:18 pm

This was a very informative and helpful article (both Part 1 and Part 2). I appreciate that you want organizations to use this material and adapt it to their own use. Thank you so much – this was exactly what I was looking for 🙂
Saritha

October 24, 2023 at 9:57 pm

I will post a commonly asked interview question and let me know your best answer to it. You being the expert I wanted to know the answer from you.

Question: If there are lot of test cases to be executed and testing time is very less (true in most projects and companies), then as a tester how would you ensure the quality of the product? How will you make sure that almost all the test cases are executed?

My answer: Based on Technical and Business risk I would prioritize my test cases and go from there. high risk areas are executed first, medium and then low risk. Sometimes if there is not time its okay to skip low risk areas.
Technical risk is given by Developers and Business risk either by Business analyst, customers, systems analyst depending on the organization.

Just wanted to know your opinion/answer based on your experience. Also let me know if my answer makes sense or any questions.
Vaibhav Srivastava

March 16, 2023 at 5:30 pm

Hi,

Its really a very nice article and I am looking forward more about it.

@Sheetal, try to look on ‘CA LISA for Test Data Management’. Its a good tool.

Thanks,
Vaibhav S
Peter

February 25, 2022 at 5:06 pm

Both Part 1 and Part 2 do provide a very good overview of the domain. Thanks a lot for taking the effort to write the articles.

@Sheetal, another good tool is CA Test Data Management (formerly Gridtools).
Rohit

December 22, 2021 at 5:42 pm

can we use tool for test data generation? know any such tool?
Danielle Felder

December 18, 2021 at 11:42 am

Great article. Following Peter’s comment, you might find real user reviews for CA Test Data Management on IT Central Station to be helpful

As an example, this user writes that with this tool, “we’ve saved over eleven thousand hours in manual time of trying to create test data. In that same time, that eleven thousand hours has translated to about eleven million dollars in cost savings.”
Sneha STH Author

November 24, 2021 at 2:55 pm

@Saritha : While that answer is good, here are some other points you can mention to elaborate the risk analysis that you mention when there isn’t sufficient time to test.

– What are the functions that are most apparent to the user?
– What are the functions that meet the specifications ( happy path tests) ?
– What kind of tests would cover multiple scenarios?
– What kind of functionality would have a large security impact?
– What kind of tests have been a problematic area or are error prone?
– What kind of tests are complex?
– What kind of functions have been coded in a rush?

And so on.
Terr Silver

November 3, 2021 at 5:14 am

Thanks for the article, too few articles regarding test data.

A question: Any best practices and techniques regarding Test Data and the new EU directive GDPR? Especially when working with a production clone (is anonymize everything the only solution).
Sheetal

July 16, 2021 at 6:45 am

Our test data efforts are mainly concentrated during writing test cases process. We are also thinking to use any tool for creating data. Do you have any idea about such tools?

Test Data Management

Maintenance of a Test Environment

Key Pointers To Ensure Effective Maintenance Of Test Environment

Test Data Preparation

Test Data Management

Test Data Management Strategies

Conclusion

Was this helpful?

Recommended Reading

12 thoughts on “What is Test Data Management (TDM): Strategy with Example”

Leave a Comment Cancel reply