Software testing has variety of areas to be concentrated. Major varieties are functional and non functional testing. Functional testing is the procedural way to ensure so that the functionality developed works as expected. Non-functional testing is the approach by which the non functional aspects like enhanced or performance at an accepted level can be ensured.
There is another flavor of testing called DB testing. Data is organized in the database in the form of tables. For business there can be flows where the data from the multiple tables can be merged or processed on to a single table and vice versa.
ETL testing is one another kind of testing that is preferred in the business case where a kind of reporting need is sought by the clients. The reporting is sought in order to analyze the demands, needs and the supply so that clients, business and the end users are very well served and benefited.
What you will learn in this tutorial?
In this tutorial you will learn what is database testing, what is ETL testing, difference between DB testing and ETL testing, and more details about ETL testing need, process, and planning with real examples.
We have also covered ETL testing in more details on below page. Also have a look at it.
What You Will Learn:
DB Testing vs. ETL Testing
Most of us are little confused over considering that both database testing and the ETL testing are similar and same. The fact is they are similar but not same.
DB Testing is usually used extensively in the business flows where there are multiple data flows occurring in the application from multiple data sources on to a single table. The data source can be a table, flat file, application or anything else that can yield some output data. In turn the output data obtained can still be used as input for the sequential business flow. Hence when we perform DB testing the most important thing that has to be captured is the way the data can get transformed from the source along with how it gets saved in the destination location.
Synchronization is one major and the essential thing that has to be considered when performing the DB testing. Due to the positioning of the application in the architectural flow, there might be few issues with the data or DB synchronization. Hence while performing the testing, this has to be taken care as this can overcome the potential invalid defects or bugs.
Project “A” has integrated architecture where particular application makes use of data from several other heterogeneous data sources. Hence the integrity of these data with the destination location has to be done along with the validations for the following:
- Primary foreign key validation
- Column values integrity
- Null values for any columns
What is ETL Testing?
ETL testing is a special type of testing that the client wants to have it done for their forecasting and analysis of their business. This is mostly used for the reporting purposes. For instance if the clients need to have reports on the customers who use or go for their product based on the day they purchase, they have to make use of the ETL reports.
Post analysis and reporting, this data is data warehoused to a data warehouse where the old historical business data has to be moved.
This is a multiple level testing as the data from the source is transformed into multiple environments before it reaches the final destined location.
We will consider a group “A” doing retail customer business through a shopping market where the customer can purchase any household items required for their day to day survival. Here all the customers visiting are provided with a unique membership id with which they can gain points every time they come to purchase things from the shopping market. The regulations provided by the group say that the points gained expire every year. And depending upon their usage, the membership can be either upgraded to a higher grade member or downgraded to a lower grade member comparatively to the current grade. After 5 years of shopping market establishment now management is looking for scaling up their business along with revenue.
Hence they required few business reports so that they can promote their customers.
In database testing we perform the following:
1) Validations on the target tables which are created with columns with logical calculations as described in the logical mapping sheet and the data routing document.
2) Manipulations like Inserting, updating and deletion of the customer data can be performed on the any end user POS application in an integrated system along with the back end database so that the same changes are reflected in the end system.
3) DB testing has to ensure that there is no customer data that has been misinterpreted or even truncated. This might lead to serious issues like in correct mapping of customer data with their loyalty
In ETL testing we check for the following:
1) Assuming there are 100 customers in the source, you will check whether all these customers along with their data from the 100 rows have been moved from the source system to the target. This is known as verification of Data completeness check.
2) Checking if the customer data has been properly manipulated and demonstrated in the 100 rows. This is simply called as verification of Data accuracy check.
3) Reports for the customers who have gained points more than x values within the particular period.
Comparative study of ETL and DB testing
ETL and DB testing have few of the aspects differing within themselves that is more essential to be understood before performing them. This helps us in understanding the values and significance of the testing and the way it helps the business.
Following is a tabular form that describes the basic behaviour of both the testing formats.
|DB Testing||ETL Testing|
|Primary goal||Data integration||BI Reporting|
|Applicable place||In the functional system where the business flow occurs||External to the business flow environment. input is the historical business data|
|Automation tool||QTP, Selenium||Informatica, QuerySurge, COGNOS|
|Business impact||Severe impacts can lead as it is the integrated architecture of the business flows||Potential impacts as in when the clients wants to have the forecasting and analysis to be done|
|Modelling used||Entity Relationship||Dimensional|
|System||Online Transaction Processing||Online Analytical Processing|
|Data Nature||Normalized data is being used here||Denormalized data is being used here|
Why should the business go for ETL?
Plenty of business needs are available for them to consider ETL testing. Every business has to have their unique mission and the line of business. All business has their product life cycle which takes the generic form:
It is very clear that any new product enters the market with a tremendous growth in sales and till a stage called maturity and thereafter it declines in sales. This gradual change witnesses a definite drop in business growth. Hence it is more important to analyze the customer needs for the business growth and other factors required to make the organisation more profitable. So in reality, the clients want to analyze the historical data and come up with some reports strategically.
ETL test planning
One of the main steps in ETL testing is about planning the test that is going to be executed. It will be similar to the test plan for the system testing that is usually performed except few attributes like requirements and test cases.
Here the requirements are nothing but a mapping sheet that will have kind of mapping between data within different databases. As we are aware that the ETL testing occurs in multiple levels, there are various mappings needed for validating this.
Most of the time the data being captured from the source databases is not directly. All the source data will have the tables’ view from where the data can be used.
Examples: Following is an example of how the mappings can be provided. The two columns VIEW_NAME and TABLE_NAME can be used to represent the views for reading data from the source and the table in the ETL environment respectively. It is advisable to maintain naming convention that can help us while planning for automation. Generic notation that can be used is just prefixing the name of the environment.
Most significant thing in ETL is about identifying the essential data and the tables from the source. The next essential step is the mapping of tables from source to the ETL environment.
Following is an example how the mapping between the tables from the various environments can be related for the ETL purpose.
The above mapping assumes the data from the source table to the stage table. And from then on to the tables in EDW and then to OLAP which is the final reporting environment. Hence at any point of time, data synchronization is very important for the ETL sake.
Critical ETL needs
As we understand ETL is the need for forecasting, reporting and analysing of the business in order to capture the customer needs in a more successive manner. This will enable the business to have higher demands than the past.
Here are few of the critical needs without which ETL testing cannot be achieved:
- Data and tables identification – This is important as there can be many other irrelevant and unnecessary data that can be of least importance when forecasting and analysing the customer needs. Hence the relevant data and the tables have to be selected before starting up the ETL works.
- Mapping sheet – This is one of the critical needs while doing ETL works. Mapping of the right table from the source to the destination is mandatory and any problems or incorrect data in this sheet might impact the whole ETL deliverable.
- Table designs and data, column type – This is next major step when considering the mapping of source tables into the destined tables. The column type has to match with the tables at both the places etc.
- Database access – Main thing is the access to the database where ETL goes on. Any restrictions on the access will have equivalent impact.
ETL reporting and testing
Reporting in ETL is more important as it explains and directs the clients the customer needs. By this they can forecast and analyse the exact customer needs
A company which manufactures silk fabric wanted to analyse on their annual sales. On review of their annual sales, they found during the month of August and September there was tremendous fall in sales with the use of the report they generated. Hence they decided to roll out promotional offer like exchange, discounts etc., that enhanced their sales.
Basic issues in ETL testing
There can be number of issues while performing ETL testing like the following:
- Either the access to the source tables or the views will not be valid.
- The column name and the data type from the source to the next layer might not match.
- Number of records from the source table to the destined tabled might not match.
And there might be many more..
Following is a sample of mapping sheet where there are columns like VIEW_NAME, COLUMN_NAME, DATA_TYPE, TABLE_NAME, COLUMN_NAME, DATA_TYPE, and TRANSFORMATION LOGIC present.
The first 3 columns represent the details of the source database and the next 3 are the details for the immediate preceding database. The last column is very important. Transformation logic is the way the data from the source is read and stored in the destined database. This depends on the business and the ETL needs.
Points to remember while ETL test planning and execution
The most important thing in ETL testing is loading of data based on the extraction criteria from the source DB. When this criterion is invalid or obsolete then there will be no data in the table to perform ETL testing that really brings in more issues.
Following are few of the points to be taken care while ETL test planning and execution:
#1: Data is being extracted from the heterogeneous data sources
#2: ETL process handling in the integrated environment that have different:
- Communication protocols
#3: Necessity in having a logical data mapping sheet before the physical data can be transformed
#4: Understanding and examining of the data sources
#5: Initial load and the incremental load
#6: Audit columns
#7: Loading the facts and the dimensions
ETL Tools and their significant usage
ETL tools are basically used to build and convert the transformation logic by taking data from the source into another applying the transformation logic. You can also map the schemas from the source to the destination which occurs in unique ways, transform and clean up data before it can be moved into the destination, along with loading at the destination in an efficient manner.
This can significantly reduce the manual efforts as the mapping can be done that is used for almost all of the ETL validation and the verification.
- Informatica – PowerCenter – is one of the popular ETL tools that is introduced by the Informatica Corporation. This has very good customer base covering wide areas. The major components of the tool are its tools for clients and the repository tools and the servers. To know more about the tool please click here
- IBM – Infosphere Information Server – IBM who is the market leader in terms of Computer technology has developed the Infosphere Information server that is used for the Information Integration and Management in the year 2008. To know more about the tool please click here
- Oracle – Data Integrator – Oracle Corporation has developed their ETL tool in the name of Oracle – Data Integrator. Their increasing customer support has made them update their ETL tools in various versions. To know more about the tool please click here
More examples with usage of ETL testing:
Considering some Airlines which want to roll out promotions and offers to attract the customers strategically. Firstly they will try to understand the demands and needs with the customers specifically. In order to achieve this they will require the historical data preferably the previous 2 years data. Using the data they will analyze and prepare some reports that will be helpful in understanding the customers’ needs.
The reports can be of following kind:
- Customers from region A who travels to region B on certain dates
- Customers with specific age criterion travels to city XX
And there can be many other reports.
Analyzing these reports will help the clients in identifying the kind of promotions and offers that will benefit the customers and at the same time can benefit business where this can become a Win-Win situation. This can be easily achieved by the ETL testing and reports.
In parallel, the IT segment faces serious DB issue that has been noticed that has stopped multiple services in turn has the potential to cause impacts in the business. On investigation it was identified that some invalid data has corrupted few data bases that needed to be corrected manually.
In the former case, it is ETL reports and testing that will be required.
Whereas the latter case is where the DB testing has to be done properly to overcome issues with invalid data.
Hope the above tutorial has provided a simple and clear overview of what ETL testing is and why it has to be done along with the business impacts or benefits they yield. This does not stop here, but it can extend to set foresight in growth in business.
About the author: This tutorial is written by Nagarajan. He is a test lead with over 6 years of software testing experience in various functional areas like Banking, Airlines, and Telecom in terms of both manual and automation.
Please let us know your thoughts/questions in comments below.