ETL Testing / Data Warehouse Testing – Tips, Techniques, Process and Challenges

ETL Testing Process and Challenges:

Today let me take a moment and explain my testing fraternity about one of the much in demand and upcoming skills for my tester friends i.e. ETL testing (Extract, Transform, and Load).

This article will present you with a complete idea about ETL testing and what we do to test ETL process.

It has been observed that Independent Verification and Validation is gaining huge market potential and many companies are now seeing this as prospective business gain. Customers have been offered a different range of products in terms of service offerings, distributed in many areas based on technology, process, and solutions. ETL or data warehouse is one of the offerings which are developing rapidly and successfully.

Through ETL process, data is fetched from the source systems, transformed as per business rules and finally loaded to the target system (data warehouse). A data warehouse is an enterprise-wide store which contains integrated data that aids in the business decision-making process. It is a part of business intelligence.

What You Will Learn:

Why do organizations need Data Warehouse?

Organizations with organized IT practices are looking forward to creating the next level of technology transformation. They are now trying to make themselves much more operational with easy-to-interoperate data. Having said that data is most important part of any organization, it may be everyday data or historical data. Data is the backbone of any report and reports are the baseline on which all the vital management decisions are taken.

Most of the companies are taking a step forward for constructing their data warehouse to store and monitor real-time data as well as historical data. Crafting an efficient data warehouse is not an easy job. Many organizations have distributed departments with different applications running on distributed technology. ETL tool is employed in order to make a flawless integration between different data sources from different departments. ETL tool will work as an integrator, extracting data from different sources; transforming it into the preferred format based on the business transformation rules and loading it in cohesive DB known are Data Warehouse.

Well planned, well defined and effective testing scope guarantees smooth conversion of the project to the production. A business gains the real buoyancy once the ETL processes are verified and validated by an independent group of experts to make sure that data warehouse is concrete and robust.

ETL or Data warehouse testing is categorized into four different engagements irrespective of technology or ETL tools used:

The below diagram explains very well the ETL testing and how is it related with the ETL process:

(Note: Click on the image for enlarged view)

ETL Testing Techniques:

1) Data transformation Testing: Verify that data is transformed correctly according to various business requirements and rules.

2) Source to Target count Testing: Make sure that the count of records loaded in the target is matching with the expected count.

3) Source to Target Data Testing: Make sure that all projected data is loaded into the data warehouse without any data loss and truncation.

4) Data Quality Testing: Make sure that ETL application appropriately rejects, replaces with default values and reports invalid data.

5) Performance Testing: Make sure that data is loaded in data warehouse within prescribed and expected time frames to confirm improved performance and scalability.

6) Production Validation Testing: Validate the data in production system & compare it against the source data.

7) Data Integration Testing: Make sure that the data from various sources has been loaded properly to the target system and all the threshold values are checked.

8) Application Migration Testing: In this testing, it is ensured that the ETL application is working fine on moving to a new box or platform.

9) Data & constraint Check: The datatype, length, index, constraints, etc. are tested in this case.

10) Duplicate Data Check: Test if there is any duplicate data present in the target systems. Duplicate data can lead to wrong analytical reports.

Apart from the above ETL testing methods other testing methods like system integration testing, user acceptance testing, incremental testing, regression testing, retesting and navigation testing is also carried out to make sure everything is smooth and reliable.



ETL Testing Process:

Similar to any other testing that lies under Independent Verification and Validation, ETL also goes through the same phase.

The first two phases i.e. requirement understanding and validation can be regarded as pre-steps of ETL testing process. So, the main process can be represented as below:

It is necessary to define test strategy which should be mutually accepted by stakeholders before starting actual testing. A well-defined test strategy will make sure that correct approach has been followed meeting the testing aspiration. ETL testing might require writing SQL statements extensively by testing team or maybe tailoring the SQL provided by the development team. In any case, a testing team must be aware of the results they are trying to get using those SQL statements.

Difference between Database and Data Warehouse Testing

There is a popular misunderstanding that database testing and data warehouse is similar while the fact is that both hold different direction in testing.

There is a number of universal verifications that have to be carried out for any kind of data warehouse testing. Below is the list of objects that are treated as essential for validation in ETL testing:

– Verify that data transformation from source to destination works as expected

– Verify that expected data is added in target system

– Verify that all DB fields and field data is loaded without any truncation

– Verify data checksum for record count match

– Verify that for rejected data proper error logs are generated with all details

– Verify NULL value fields

– Verify that duplicate data is not loaded

– Verify data integrity

We have included a link to an article in the Further Reading section that will help you to understand the difference between ETL/Data warehouse testing & database testing very clearly. The article is named as “ ETL vs. DB Testing – A Closer Look at ETL Testing Need, Planning and ETL Tools”

ETL Testing Challenges:

ETL testing is quite different from conventional testing. There are many challenges we faced while performing data warehouse testing. Here is the list of few ETL testing challenges I experienced on my project:

– Incompatible and duplicate data.

– Loss of data during ETL process.

– Unavailability of the inclusive test bed.

– Testers have no privileges to execute ETL jobs by their own.

– Volume and complexity of data are very huge.

– Fault in business process and procedures.

– Trouble acquiring and building test data.

– Unstable testing environment.

– Missing business flow information.

Data is important for businesses to make the critical business decisions. ETL testing plays a significant role validating and ensuring that the business information is exact, consistent and reliable. Also, it minimizes the hazard of data loss in production.

Hope these tips will help ensure your ETL process is accurate and the data warehouse build by this is a competitive advantage for your business.

Further Reading:

This is a guest post by Vishal Chhaperia who is working in an MNC in a test management role. He is having extensive experience in managing multi-technology QA projects, Processes and teams.

Have you worked on ETL testing? Please share your ETL/DW testing tips and challenges below.