Database Testing – Properties of a Good Test Data and Test Data Preparation Techniques

A couple of months ago, I wrote about database testing strategies. It covered the aspect that is entirely related to the execution of test cases. It was all about the black-box testing of a database. There is another important aspect of DB testing activity which we will cover in this article.

Let us take a scenario here:

As a tester, you have to test the ‘Examination Results’ module of the website of a university. Consider that the whole application has been integrated and it is in ‘Ready for Testing’ state. ‘Examination Module’ is linked with ‘Registration’, ‘Courses’ and ‘Finance’ modules.

Assume that you have adequate information of the application and you created a comprehensive list of test scenarios. Now you have to design, document and execute these test cases. In ‘Actions/Steps’ or ‘Test Inputs’ section of the test cases, you will have to mention the acceptable data as input for the test. The data mentioned in test cases must be selected properly. The accuracy of ‘Actual Results’ column of TC Document is primarily dependent upon the test data. So, step to prepare the input test data is significantly important. Thus, here is my rundown on “DB Testing – Test Data Preparation Strategies”.

Properties of Test Data:

The test data should be selected precisely and it must possess the following four qualities:

1) Realistic: By realistic, it means the data should be accurate in the context of real-life scenarios. For example, in order to test ‘Age’ field, all the values should be positive and 18 or above. It is quite obvious that the candidates for an admission in the university are usually 18 years old (this might be defined differently in terms of business requirements).

If testing is done by using the realistic test data, then it will make the app more robust as most of the possible bugs can be captured using realistic data. Another advantage of realistic data is its reusability which saves our time & effort for creating new data again and again.

When we are talking about realistic data, I would like to introduce you to the concept of the golden data set. A golden data set is the one which covers almost all the possible scenarios that occur in the real project. By using the GDS, we can provide maximum test coverage. I use the GDS for doing regression testing in my organization and this helps me to test all possible scenarios that can occur if the code goes in production box.

There are a lot of test data generator tools available in the market that analyze the column characteristics and user definitions in the database and based on these, they generate realistic test data for you. Few of the good examples of the tools that generate data for database testing are DTM Data Generator, SQL Data Generator and mockaroo.

2. Practically valid: This is similar to realistic but not the same. This property is more related to the business logic of AUT e.g. value 60 is realistic in age field but practically invalid for a candidate of Graduation or even Masters Programs. In this case, a valid range would be 18-25 years (this might be defined in requirements).

3. Versatile to cover scenarios: There may be several subsequent conditions in a single scenario, so choose the data shrewdly to cover maximum aspects of a single scenario with the minimum set of data, e.g. while creating test data for result module, do not only consider the case of regular students who are smoothly completing their program. Give attention to the students who are repeating the same course and belong to different semesters or even different programs. The dataset may look like this:

Sr# Student_ID Program_ID Course_ID Grade
1 BCS-Fall2011-Morning-01 BCS-F11 CS-401 A
2 BCS-Spring2011-Evening-14 BCS-S11 CS-401 B+
3 MIT-Fall2010-Afternoon-09 MIT-F10 CS-401 A-

There might be several other interesting and tricky sub-conditions. E.g. the limitation of years to complete a degree program, passing a prerequisite course for registering a course, maximum no. of courses a student may enrol in a single semester etc. etc. Make sure to cover all these scenarios wisely with the finite set of data.



4. Exceptional data (if applicable/required): There may be certain exceptional scenarios that occur less frequently but demand high attention when occurred, e.g. disabled students related issues.

Another good explanation & example of exception data set is seen in the image below:

Takeaway: A test data is known as a good test data if it is realistic, valid and versatile. It is an added advantage if the data provides coverage for exceptional scenarios as well.

Test data preparation techniques:

We have briefly discussed the important properties of test data and it has also elaborated that how test data selection is important while doing the database testing. Now let’s discuss the techniques to prepare test data.

There are only two ways to prepare test data:

Method 1. Insert New Data:

Get a clean DB and insert all the data as specified in your test cases. Once, all your required and desired data has been entered, start executing your test cases and fill ‘Pass/Fail’ columns by comparing the ‘Actual Output’ with ‘Expected Output’.  Sounds simple, right? But wait, it’s not that simple.

Few essential and critical concerns are as follows:

Above mentioned five issues are the most critical and the most obvious drawbacks of this technique for test data preparation. But, there are some advantages as well:

Method 2. Choose sample data subset from actual DB data:

This is the feasible and more practical technique for test data preparation. However, it requires sound technical skills and demands detailed knowledge of DB Schema and SQL. In this method, you need to copy and use production data by replacing some field values by dummy values. This is the best data subset for your testing as it represents the production data.  But this may not be feasible all the time due to data security and privacy issues.

Takeaway: In the above section, we have discussed above the test data preparation techniques. In short, there are two techniques – either create fresh data or select a subset from already existing data. Both need to be done in a way that the selected data provides coverage for various test scenarios mainly valid & invalid test, performance test and null test.

In the last section, let us take a quick tour on data generation approaches as well. These approaches are helpful when we need to generate new data.

Test Data Generation Approaches:

Takeaway: There are 4 approaches to test data generation – manual, automation, back-end data injection and third-party tools. Each approach has its own pros and cons. You should select the approach that satisfies your business and testing needs.

Further Reading:

This is a guest article by Rizwan Jafri.

The author is having more than 4 years of experience and Currently working as a Sr. QA Engineer in Systems Limited Lahore, Pakistan.

If you have any questions, please feel free to ask in below comment section.