Test Data Management Techniques and Best Practices (Part 2)

In last tutorial we focused on how to prepare test bed to minimize test environment defects. In continuation with the same tutorial, today we will learn how to setup and maintain test environment and important test data management techniques.

Test environment setup process

The most important factor for test environment is to replicate it as close to the end user environment as possible. Commonly, end users are not expected to perform any configuration or installations by themselves as a complete product or system is shipped out to them. Hence, by that definition even the test teams need not explicitly perform such configurations.

test data management techniques

If any such configurations are necessitated for purely testing purposes (but will be configured for end users), then administrators must be identified. Those administrators who configure the development environment must be the same people who configure the test environment. If the development team itself takes the initiative in installation/configuration, then they must help doing the same even in the test environment.

Example: if you have to test an application (with its associated middleware to be installed and configured) on a system across various OS platforms, etc. – the best way to address this is to use virtualization or cloud environments. Have a master system where in all the applications and needed middleware are correctly installed and configured. Then make this system a master image by capturing it and clone several instances from this same image such that each user feels like he has a dedicated system with the application under test.

Here below, is a pictorial depiction of what a test environment process would entail:

(Click on image to enlarge)

test data and test environment process

Test Environment Setup Process

Maintenance of a test environment

So much said about the test environment preparation albeit the challenges, this is doubtlessly more than a ground to necessitate the maintenance or standardize the test environment. A lot of times, a tester loses testing time because of the environment or setup issues. With a rapid increase in the operating systems and the range of hardware and software, the environment has to be almost dynamic in nature, in order to cope with the needs. Test teams can ensure that they are delivering a high quality product with a good test management process and this would help in having optimal usage of resources that are limitedly available.

Key pointers to ensure effective maintenance of the test environment:

As test environments most times contain heterogeneous platforms and stacks, presenting below are some key pointers to ensure effective maintenance of the test environment.

#1. Effective environment sharing and distribution:

As already mentioned earlier one of the key challenges of test environment preparation is that many teams or people need to use the same set of resources for their testing purposes. Hence a suitable sharing mechanism needs to be developed that caters to the needs of all teams and people without delaying schedules.

This can be achieved by maintaining a repository or information link wherein all the data regarding:

  1. who is using the environment,
  2. when the environment is free to be used and
  3. how the distribution of environment usage time, is entered accurately.

By proactively determining where the requirement of the resources is large versus the limited availability of them, a large amount of chaos gets automatically nullified.

A second aspect to this is to revisit the teams’ resource requirements for each testing cycle and look for which resources are not utilized very heavily. Analyze if those particular resources can be replaced with any new resources or systems that maybe needed.

#2. Sanity checks:

Some test requirements need a comprehensive test setup or setup which involve elaborate steps which are extremely time consuming. This is specifically the case during end to end testing which involve two or more components to work together. Hence, the same test environment may need to be re-used by multiple teams.

In such cases, having a good understanding of the entire environment as a whole, collating what kind of tests are being performed by various teams, will paint a reasonable picture to help provide those specific resources to the respective teams.

Considering the above factors – basic sanity testing can be performed that will help in expediting the tests for individual teams or immediately alarm them, if the environment has to undergo some changes or fixes as a result of those sanity checks.

#3. Keeping track of any outages:

Just like every team that owns a test environment has their, an organization has all the possible test environments maintained by a global support team. Additionally, just like teams owning their test environment have their own local downtime in case of any firmware / software upgrades, the global teams also have to ensure that all the environments are adhering to the latest standards which may involve either power or network outages. Hence those maintaining the test environment must keep an eye on any such outages that may happen and inform the test team beforehand to plan their work accordingly.

#4. Virtualize wherever possible:

This is again very relevant where testing needs to be done sharing the environment and there is a dire need for optimization of resources. In such times using a virtualized environment such a cloud for testing purposes is the answer. When using such an environment, all the testers need to do, is to provision an instance and this instance once provisioned, will form an independent test bed or test environment containing all the diverse resources such as a dedicated OS, database, middleware, automation frameworks, etc. required for the testing.

Once the testing is concluded, these instances can be destroyed thereby greatly reducing costs for an organization. Cloud environments are particularly useful for functional verification testing, automation testing areas.

#5. Regression testing / Automation:

As and when there are new functions and features being developed, regression tests need to be performed for these functions for every release cycle. Hence even though on the posterior, the test environments for regression testing seem to be running on the same test setup with the same data, in actuality they are constantly evolving every release in accordance to the features being implemented as well. Every product release cycle would have one or more rounds of regression testing. Thus establishing regression test environments for every product release cycle and re-using them within the cycle, would definitely portray the stability of the test environment.

Developing automation frameworks and using automation for regressive tests, also helps in improving the efficiency of a test environment because automation will assume that the environment is stable and the defects that are originated are purely feature/code oriented.

#6. General governance:

When there are some issues with the test environment hardware or software, these issues must be directed to the right people to ensure fixes if cannot be fixed internally by those maintaining the lab.

For example, if any testing originates a defect which comprises of a limitation in the firmware or the software that is being used in the current environment, this generally cannot be fixed solely by those responsible for environment maintenance. Hence the consumer ( who is the tester in this case ) must be asked to raise appropriate service requests. These must be directed to the appropriate vendor or team and co-ordination must be done regularly with them to ensure the next version has fixes to the particular problem.


Another aspect of governance would be to provide a detailed environment reports to the management or stakeholders from time to time which helps in emanating transparency and forms a good ground for any analysis.

Test data Preparation:

Let’s now take a look at the latter portion of a test bed creation – which involves setting up the test data. With such a large chunk being said about the test environment, the true essence of the test environment, its robustness and efficiency can be measured with the test data. By definition, the test data is any kind of input given to the software code being tested.

Even though we spend a good amount of time in designing test cases, the reason test data is important is because it ensures complete testing coverage for all kinds of scenarios, thereby improving the quality. There could be some test data that is needed for any happy or positive path testing. Some other data could be designed for error or negative testing which is very helpful in discovering how the application performs when put in abnormal situations.

Test data is generally created before the text execution begins because every test environment has it’s own set of complexities or a preparing the data itself may be a long drawn out process. So generally the test data sources could be the internal development team or the end users consuming the code or feature.

Example: Function testing

Let’s take an example where you need to perform functional testing or black box testing. Here the objective is that the code has to functionally to meet the requirements that are specified.

So in such cases – preparation of test cases should generally have coverage of the following kinds of data:

  • Positive path data: With the development use case document as reference, this is the data generally in sync with performing the positive path scenarios.
  • Negative path data: This is data which is generally considered “invalid” with respect to the correct functional working of the code.
  • Null data: Supplying no data when the application or code expects that data.
  • Erroneous data: Determining the performance of the code when a data is supplied in an illegal format.
  • Boundary conditions data: Test data that is supplied out of index or array to determine how the code performs.

Test data plays a key role in identifying where a product or feature can completely break. Always have a practice of polling and validating the kind of data fed to the test environment in different phases of testing.

Test data management

When test data plays such an important role in assuring the quality of the product, it’s reasonable to say that its management and streamlining also plays an equally important role in Quality Assurance of any product that has to be released to the customers.

Need for test data management and best practices:

  • A large number of organizations are having rapidly changing business goals to cater to the end user needs and hence it’s needless to mention that the appropriate test data is instrumental in determining the quality of the testing. This will involve setting up the exact kind of data for the respective test environments and monitoring the behavioral patterns. As already discussed, a large chunk of a testing team’s time is expended in planning of test data and its related tasks. Many a times testing of any functionality tends to be majorly hampered due to the non availability of appropriate test data which poses a critical challenge with respect to complete testing coverage.
  • Also sometimes for certain testing requirements test data needs to be constantly refreshed. This itself causes a lot of delay in the cycle because of constant re-work which also increases the cost of the application reaching the market. In certain other times if the product being shipped has an involvement with different work group units in a large organization, the creation and refreshing of test data necessitates an intricate level of co-ordination across these work groups.
  • Even though the test teams need to create all kinds of data that is possible to ensure adequate testing, organizations must also consider that doing this would mean that all the different kinds of data need to be stored in some kind of a repository. Although having a repository is good practice, storing excessive and unwanted data would not only significantly increase the storage space to store these large chunks of data but also make it increasingly challenging to fetch the appropriate data for the testing in question if there is no version maintenance and archiving of this repository.

Most of the organizations are generally faced with these common challenges with respect to test data. Thus, there needs to be some management strategies that need to be put into place to minimize the degree of these challenges.

Here below are some suggested methodologies for the management of the test data and keep it relevant to the testing needs. The following practices are very basic and generic which will commonly work for most organizations. How it is adopted, is purely the discretion of the respective organizations.

Test Data Management Strategies:

#1. Analysis of data

Generally test data is constructed based on the test cases to be executed. For example in a System testing team, the end to end test scenario needs to be identified based on which the test data is designed. This could involve one or more applications to work. Say in a product which does workload management – it involves the management controller application, the middleware applications, the data base applications all to function in co-relation with one another. The required test data for the same could be scattered. A thorough analysis of all the different kinds of data that maybe required has to be made to ensure effective management.

#2. Data setup to mirror the production environment

This is generally an extension from the previous step and enables to understand what the end user or production scenario will be and what data is required for the same. Use that data and compare that data with the data that currently exists in the current test environment. Based on this new data may need to be created or modified.

#3. Determination of the test data clean-up

Based on the testing requirement in the current release cycle (where a release cycle can span over a long time), the test data may need to be altered or created as stated in the above point. This test data although not immediately relevant, maybe required at a later point. Hence a clear process of deeming when the test data can be cleaned up should be formulated.

#4. Identify sensitive data and protect it

Many times in order to properly test applications, there may be large amount of very sensitive data that is required. For example, a cloud based test environment is a popular choice because it renders on demand testing of different products. However something as basic as guaranteeing user privacy in a cloud is cause of concern. So especially in cases where we will need to replicate the user environment, the mechanism to shield sensitive data must be identified. The mechanism is largely governed by volume of the test data used.

#5. Automation

Just as we adopt automation for running repetitive tests or for running the same tests with different kinds of data, it’s also possible to automate the creation of test data. This would help in exposing any errors that may occur with respect to data during testing. A possible way to do this is by comparing the results that are produced by a set of data from consecutive test runs. Next automate this process of comparing.

#6. Effective data refresh using a central repository

This is by far the most important methodologies and forms the heart of implementing data management. All of the points mentioned above, especially those with respect to data setup, data clean up are directly or indirectly co-relate with this. A lot of effort in creating test data can be saved by maintaining a central repository which contains all kinds of data that maybe required for various kinds of testing. How is this done? In consecutive test cycles, for either a new test case or modified test case check if the data exists in the repository. If not existing, feed that data in the test environment first.

Next, this can be directed to this repository for future reference. Now for consecutive release cycles, the test team can use all or a subset of this data. Isn’t the advantage very apparent? Depending on the sets of data that are frequently used, obsolete data can be easily eliminated and hence ensuring that correct data is always present, thereby reducing cost to store that unneeded data. Secondly, you can also have a couple versions of this repository saved or can revise it as necessary. Having different versions of the repository can help greatly in regression testing to identify what change in data can cause the code to break.


The test environment should be of prime importance in every test team. Every release cycle will bring a whole host of new challenges to combat with an unreliable and unplanned test environment. As a revolutionary measure, many organizations are now putting strategies in place like forming dedicated Test Environment Maintenance teams which establish certain frameworks for effective maintenance of the test environments, to ensure smoother release cycles.

Improved testing is only an obvious effect of streamlining test data management. A key essence of it is that ensures a cost effective solution for organizations while making no compromise on the reliability of the product.

Let us know how you manage your test environment and how you prepare test data? Want to add any tips?

Recommended reading


#1 Sheetal

Our test data efforts are mainly concentrated during writing test cases process. We are also thinking to use any tool for creating data. Do you have any idea about such tools?

#2 Vaibhav Srivastava


Its really a very nice article and I am looking forward more about it.

@Sheetal, try to look on ‘CA LISA for Test Data Management’. Its a good tool.

Vaibhav S

#3 Rohit

can we use tool for test data generation? know any such tool?

#4 Saritha

I will post a commonly asked interview question and let me know your best answer to it. You being the expert I wanted to know the answer from you.

Question: If there are lot of test cases to be executed and testing time is very less (true in most projects and companies), then as a tester how would you ensure the quality of the product? How will you make sure that almost all the test cases are executed?

My answer: Based on Technical and Business risk I would prioritize my test cases and go from there. high risk areas are executed first, medium and then low risk. Sometimes if there is not time its okay to skip low risk areas.
Technical risk is given by Developers and Business risk either by Business analyst, customers, systems analyst depending on the organization.

Just wanted to know your opinion/answer based on your experience. Also let me know if my answer makes sense or any questions.

#5 Sneha

@Saritha : While that answer is good, here are some other points you can mention to elaborate the risk analysis that you mention when there isn’t sufficient time to test.

– What are the functions that are most apparent to the user?
– What are the functions that meet the specifications ( happy path tests) ?
– What kind of tests would cover multiple scenarios?
– What kind of functionality would have a large security impact?
– What kind of tests have been a problematic area or are error prone?
– What kind of tests are complex?
– What kind of functions have been coded in a rush?

And so on.

#6 Suganya

Is there any open source for “Test Data Management”

#7 Carolyn

This was a very informative and helpful article (both Part 1 and Part 2). I appreciate that you want organizations to use this material and adapt it to their own use. Thank you so much – this was exactly what I was looking for :-)

#8 Peter

Both Part 1 and Part 2 do provide a very good overview of the domain. Thanks a lot for taking the effort to write the articles.

@Sheetal, another good tool is CA Test Data Management (formerly Gridtools).

#9 Danielle Felder

Great article. Following Peter’s comment, you might find real user reviews for CA Test Data Management on IT Central Station to be helpful

As an example, this user writes that with this tool, “we’ve saved over eleven thousand hours in manual time of trying to create test data. In that same time, that eleven thousand hours has translated to about eleven million dollars in cost savings.”

Leave a Comment