Database Testing – Properties of a Good Test Data and Test Data Preparation Techniques

A couple of months ago, I wrote about database testing strategies. It covered the aspect that is entirely related to the execution of test cases. It was all about the black-box testing of a database. There is another important aspect of DB testing activity which we will cover in this article.

Let us take a scenario here:

As a tester, you have to test the ‘Examination Results’ module of the website of a university. Consider that the whole application has been integrated and it is in ‘Ready for Testing’ state. ‘Examination Module’ is linked with ‘Registration’, ‘Courses’ and ‘Finance’ modules.

Database Testing1

Assume that you have adequate information of the application and you created a comprehensive list of test scenarios. Now you have to design, document and execute these test cases. In ‘Actions/Steps’ or ‘Test Inputs’ section of the test cases, you will have to mention the acceptable data as input for the test. The data mentioned in test cases must be selected properly. The accuracy of ‘Actual Results’ column of TC Document is primarily dependent upon the test data. So, step to prepare the input test data is significantly important. Thus, here is my rundown on “DB Testing – Test Data Preparation Strategies”.

Properties of Test Data:

DB testing

The test data should be selected precisely and it must possess the following four qualities:

1) Realistic: By realistic, it means the data should be accurate in the context of real-life scenarios. For example, in order to test ‘Age’ field, all the values should be positive and 18 or above. It is quite obvious that the candidates for an admission in the university are usually 18 years old (this might be defined differently in terms of business requirements).

If testing is done by using the realistic test data, then it will make the app more robust as most of the possible bugs can be captured using realistic data. Another advantage of realistic data is its reusability which saves our time & effort for creating new data again and again.

When we are talking about realistic data, I would like to introduce you to the concept of the golden data set. A golden data set is the one which covers almost all the possible scenarios that occur in the real project. By using the GDS, we can provide maximum test coverage. I use the GDS for doing regression testing in my organization and this helps me to test all possible scenarios that can occur if the code goes in production box.

There are a lot of test data generator tools available in the market that analyze the column characteristics and user definitions in the database and based on these, they generate realistic test data for you. Few of the good examples of the tools that generate data for database testing are DTM Data Generator, SQL Data Generator and mockaroo.

2. Practically valid: This is similar to realistic but not the same. This property is more related to the business logic of AUT e.g. value 60 is realistic in age field but practically invalid for a candidate of Graduation or even Masters Programs. In this case, a valid range would be 18-25 years (this might be defined in requirements).

3. Versatile to cover scenarios: There may be several subsequent conditions in a single scenario, so choose the data shrewdly to cover maximum aspects of a single scenario with the minimum set of data, e.g. while creating test data for result module, do not only consider the case of regular students who are smoothly completing their program. Give attention to the students who are repeating the same course and belong to different semesters or even different programs. The dataset may look like this:


There might be several other interesting and tricky sub-conditions. E.g. the limitation of years to complete a degree program, passing a prerequisite course for registering a course, maximum no. of courses a student may enrol in a single semester etc. etc. Make sure to cover all these scenarios wisely with the finite set of data.

good test data

4. Exceptional data (if applicable/required): There may be certain exceptional scenarios that occur less frequently but demand high attention when occurred, e.g. disabled students related issues.

Another good explanation & example of exception data set is seen in the image below:

Exceptional data

Takeaway: A test data is known as a good test data if it is realistic, valid and versatile. It is an added advantage if the data provides coverage for exceptional scenarios as well.

Test data preparation techniques:

We have briefly discussed the important properties of test data and it has also elaborated that how test data selection is important while doing the database testing. Now let’s discuss the techniques to prepare test data.

There are only two ways to prepare test data:

Method 1. Insert New Data:

Get a clean DB and insert all the data as specified in your test cases. Once, all your required and desired data has been entered, start executing your test cases and fill ‘Pass/Fail’ columns by comparing the ‘Actual Output’ with ‘Expected Output’.  Sounds simple, right? But wait, it’s not that simple.

Few essential and critical concerns are as follows:

  • Empty instance of database may not be available
  • Inserted test data may be insufficient for testing some cases like performance and load testing.
  • Inserting the required test data into blank DB is not an easy job due to the database table dependencies. Because of this inevitable restriction, data insertion can become the difficult task for the tester.
  • Insertion of limited test data (just according to the test case’s needs) may hide some issues that could be found only with the large data set.
  • For data insertion, complex queries and/or procedures may be required, and for this sufficient assistance or help from the DB developer(s) would be necessary.

Above mentioned five issues are the most critical and the most obvious drawbacks of this technique for test data preparation. But, there are some advantages as well:

  • Execution of TCs becomes more efficient as the DB has the required data only.
  • Bugs isolation requires no time as only the data specified in test cases is present in the DB.
  • Less time required for testing and results comparison.
  • Clutter-free test process

Method 2. Choose sample data subset from actual DB data:

This is the feasible and more practical technique for test data preparation. However, it requires sound technical skills and demands detailed knowledge of DB Schema and SQL. In this method, you need to copy and use production data by replacing some field values by dummy values. This is the best data subset for your testing as it represents the production data.  But this may not be feasible all the time due to data security and privacy issues.

Takeaway: In the above section, we have discussed above the test data preparation techniques. In short, there are two techniques – either create fresh data or select a subset from already existing data. Both need to be done in a way that the selected data provides coverage for various test scenarios mainly valid & invalid test, performance test and null test.

In the last section, let us take a quick tour on data generation approaches as well. These approaches are helpful when we need to generate new data.

Test Data Generation Approaches:

  • Manual Test data generation: In this approach, the test data is manually entered by testers as per the test case requirements. It is a time taking the process and also prone to errors.
  • Automated Test Data generation: This is done with the help of data generation tools. The main advantage of this approach is its speed and accuracy. However, it comes at a higher cost than manual test data generation.
  • Back-end data injection: This is done through SQL queries. This approach can also update the existing data in the database. It is speedy & efficient but should be implemented very carefully so that existing database does not get corrupted.
  • Using Third Party Tools: There are tools available in the market that first understand your test scenarios and then generate or inject data accordingly to provide wide test coverage. These tools are accurate as they are customized as per the business needs. But, they are quite costly.

Takeaway: There are 4 approaches to test data generation – manual, automation, back-end data injection and third-party tools. Each approach has its own pros and cons. You should select the approach that satisfies your business and testing needs.

Further Reading:

This is a guest article by Rizwan Jafri.
The author is having more than 4 years of experience and Currently working as a Sr. QA Engineer in Systems Limited Lahore, Pakistan.

If you have any questions, please feel free to ask in below comment section.

26 thoughts on “Database Testing – Properties of a Good Test Data and Test Data Preparation Techniques”

  1. nice one.
    just want to share one tip – before using production data make sure you mask all the data values. I faced big problem due to this.We used same Db for testing without masking the user emails and our testing resulted in actual emails to customers. This a big no no..

  2. Nice article.

    You mentioned: “There may be several subsequent conditions in a single scenario, so choose the data shrewdly to cover maximum aspects of a single scenario with minimum set of data,”

    I agree. An excellent way to accomplish this, which not enough testers know about, is through pairwise testing (and more thorough combinatorial testing). Entering the test inputs into a pairwise test case generating tool, like Hexawise, can be an excellent way to thoroughly test a system with a minimal number of tests.

    In addition, such combinatorial tests will have a minimal amount of repetition from test to test.

    – Justin
    See, e.g.,:

  3. Thanks for the aritcle Are you suggesting back end verification?, if yes can you please provide some examples which will stand while convincing management ? Have you to some examples for bugs specific to database used ?

  4. Just wait for the next (and Last) article on this topic from me. All examples will be given.

    @ Justin: Thank you so much for sharing hexawise URL, I have used it and found an excellent tool.

    Thank you all for appreciation.


  5. Excellent usage but one good peace of advice as we know that we grasp quick we are demonstrated with instances so please introduce examples as well.


  6. Hi all…
    i m doing manual testing of web applications in a pvt company for 2,3 months. Unfortunately i have no senior to guide me…
    I need some practical test scenarios & Test cases (for guidance purpose.i.e. how to write test cases).
    would any of you like to help me? plzz send me some test cases on the following id…i’ll be vry thankful…

  7. Nice and informative article.

    Test Data preparation and management has always been challenge for testing team. Use of right test data during execution guarantees successful testing.

    Test data generation and preparation vary from application to application.

    Test data requirement for Multi tier application that required data inputs from several other applications to generate test data for Application under test will be different from a simpler application.

    In such case Test Data Management tool can be used to create and manager test data.

    InfoSphere Optim Test Data Management Solution from IBM is one the Test Data Management Tool. Very efficient and easy to learn.

  8. can you pls suggest something for those who have no technical background means knows nothing about testing.
    can u suggest something like which kind of testing field they can go in as there are many kinds of testing?
    can u pls share some info about automation and manual testing and which one is better for people with no technical background?


  9. Hi all…
    i m doing manual testing of web applications in a pvt company for 2 months.
    I need some practical test scenarios would any of you like to help me? please send me some test samples on the following id… will like to thank you so much…

  10. Also, testing tables and stored procedures dependencies will help to find relevant information. For instance, if a stored procedure is being called inside another procedure and some argument value is missing, then it might cause inconsistencies.

    Great post…Thanks!!

  11. For test data maintenance which is better either
    1. test data in excel or
    2. test data in database because its a huge application having many tables and relations between them.

Leave a Comment