Learn All About Data Warehousing Fundamentals. This In-Depth Guide Explains What is Data Warehousing Along with its Types, Characteristics, Merits, and Demerits:
A data warehouse is the latest storage trend in today’s IT industry.
This tutorial is going to explain What is a Data Warehouse? Why is Data Warehousing crucial? Types of Data Warehouse applications, Characteristics of a Data Warehouse, Benefits and Disadvantages of Data Warehousing.
List of Data Warehousing Tutorials In This Series:
Tutorial #1: Data Warehousing Fundamentals
Tutorial #2: Whats is ETL Process in Data Warehouse?
Tutorial #3: Data Warehouse Testing
Tutorial #4: Dimensional Data Model in Data Warehouse
Tutorial #5: Schema Types in Data Warehouse Modeling
Tutorial #6: Data Mart Tutorial
Tutorial #7: Metadata in ETL
Overview Of Tutorials In This Data Warehousing Series
Tutorial_Num | What You Will Learn |
---|---|
Tutorial #1 | Data Warehousing Fundamentals Learn all about Data Warehousing Concepts from this tutorial. This in-Depth guide explains What Data Warehousing is Along with its Types, Characteristics, Merits and Demerits. |
Tutorial #2 | Whats is ETL Process in Data Warehouse? This in-depth tutorial on ETL Process explains Process Flow & Steps Involved in the ETL (Extraction, Transformation, and Load) Process in Data Warehouse. |
Tutorial #3 | Data Warehouse Testing Goals & Significance of Data Warehouse Testing, ETL Testing Responsibilities, Errors in DW and ETL Deployment in detail in this tutorial. |
Tutorial #4 | Dimensional Data Model in Data Warehouse This tutorial explains the Benefits & Myths of Dimensional Data Model in Data Warehouse. You will also learn about Dimension Tables & Fact Tables with Examples. |
Tutorial #5 | Schema Types in Data Warehouse Modeling This tutorial explains various Data Warehouse Schema Types. Learn What is Star Schema & Snowflake Schema and the difference between Star & Snowflake Schema. |
Tutorial #6 | Data Mart Tutorial This tutorial explains Data Mart Concepts including Data Mart Implementation, Types, Structure as Well as Differences Between Data Warehouse Vs Data Mart. |
Tutorial #7 | Metadata in ETL This tutorial explains the Role of Metadata in ETL, Examples & Types of Metadata, Metadata Repository & Challenges in Metadata Management. |
Table of Contents:
Data Warehousing Fundamentals: A Complete Guide
Target Audience
- Data warehouse/ETL developers and testers.
- Database professionals with basic knowledge of database concepts.
- Database administrators/Big data experts who want to understand data warehousing concepts.
- College graduates/Freshers who are looking for data warehouse jobs.
What Is Data Warehousing?
A Data Warehouse (DW) is a repository of huge amount of organized data. This data is consolidated from one or more different data sources. DW is a relational database that is mainly designed for analytical reporting and on-time decision making in organizations.
The data for this purpose is isolated and optimized from the source transaction data, which will not have any impact on the main business. If an organization introduces any business change, then DW is used to examine the effects of that change, and hence DW is also used to monitor the non-decision making process.
The data warehouse is mostly a read-only system as operational data is very much separated from DW. This provides an environment to retrieve the highest amount of data with good query writing.
Thus DW will act as the backend engine for Business Intelligence tools which shows the reports, dashboards for the business users. DW is extensively used in banking, financial, retail sectors, etc.
Why Is Data Warehousing Crucial?
Enlisted below are some of the reasons for which Data Warehouse is crucial.
- Data warehouse gathers all the operational data from several heterogeneous sources of “different formats” and through the process of extract, transform and load (ETL) it loads the data into DW in a “standardized dimensional format” across an organization.
- Data warehouse maintains both “current data and historical data” for analytical reporting and fact-based decision making.
- It helps organizations to take “smarter and quick decisions” on reducing costs and to increase the revenue, by comparing quarter and annual reports to improve their performance.
Types Of Data Warehouse Applications
Business Intelligence (BI) is a branch of data warehousing designed for decision making. Once the data in the DW is loaded, BI plays a major role by analyzing the data and presenting it to the business users.
Practically, the term “data warehouse applications” implies, in how many different types the data can be processed and utilized.
We have three types of DW Applications as mentioned below.
- Information processing
- Analytical processing
- Data mining which serves the purpose of BI
#1) Information Processing
This is a kind of application where the data warehouse allows direct one-one contact with the data stored in it.
As the data can be processed by writing direct queries on the data (or) with a basic statistical analysis on the data and the end results will be reported to the business users in the form of reports, tables, charts or graphs.
DW supports the following tools for Information Processing:
(i) Query Tools: The business (or) the analyst runs the queries using query tools to explore the data and generate the output in the form of reports or graphics as per the business requirement.
(ii) Reporting Tools: If the business wants to see the results in any defined format and on a scheduled basis i.e. daily, weekly or monthly then reporting tools will be used. These kinds of reports can be saved and reviewed at any time.
(iii) Statistics Tools: If the business wants to do an analysis on a broad view of data then statistics tools will be used to generate such results. Businesses can make conclusions and predictions by understanding these strategic results.
#2) Analytical Processing
This is a kind of application where a data warehouse allows the analytical processing of data stored in it. The data can be analyzed by the following operations as Slice-and-Dice, Drill Down, Roll Up and Pivoting.
(i) Slice-and-Dice: Data warehouse allows slice-and-dice operations to analyze the data accessed from many levels with a combination of different perspectives. The slice-and-dice operation internally uses the drill-down mechanism. Slicing works on dimensional data.
As a part of the business requirement, if we focus on a single area then slicing analyzes the dimensions of that particular area as per the requirements and gives the results. Dicing works on analytic operations. Dicing zooms for a specific set of attributes over all the dimensions to provide diverse perspectives. The dimensions are considered from one or more consecutive slices.
(ii) Drill Down: If the business wants to go to a more detailed level of any summary number, then drill down is an operation for navigating down that summary to minor detailed levels. This gives a great idea of what is happening and where the business has to be focused more closely.
Drill down tracks from the hierarchy level until the minor detail level for the root cause analysis. This can be easily understood with an example as sales drill down can happen from Country-level -> Region level -> State-level -> District level -> Store level.
(iii) Roll up: Roll up works opposite to the drill-down operation. If the business wants any summarized data, then roll up comes into the picture. It aggregates the detail level data by moving up in the dimensional hierarchy.
Roll-ups are used to analyze the development and performance of a system.
This can be understood with an Example as in a sales roll up where the totals can be rolled up from City level -> State-level -> Region level -> Country level.
(iv) Pivot: Pivoting analyzes dimension data by rotating the data on the cubes. For Example, the row dimension can be swapped into the column dimension and vice versa.
#3) Data Mining
This is a kind of application where the data warehouse allows knowledge discovery of the data and results will be represented with visualization tools. In the above two types of applications, the information can be driven by the users.
As the data goes vast in various businesses, it is difficult to query and drill down the data warehouse to get all possible insights into data. Then data mining comes into the picture to accomplish the discovery of knowledge.
This drives into the data with all the past associations, results etc and predicts the future. Hence this is data-driven and not user-driven. The data can be discovered by finding hidden patterns, associations, classifications, and predictions.
Data mining goes in-depth with the data to predict the future. Based on the predictions, it also suggests the actions to take.
Given below are the various activities of Data Mining:
- Patterns: Data mining discovers patterns that occur in the database. Users can provide the business inputs on which some knowledge of the patterns is expected for decision making.
- Associations/Relationships: Data mining discovers relationships between the objects with the frequency of their association rules. This relationship may be between two or more objects (or) it may discover the rules within the properties of the same object.
- Classification: Data mining organizes data in a set of pre-defined classes. So if any object is picked up from the data, classification associates the respective class label to that object.
- Prediction: Data mining compares a set of existing values to find the best possible future values/trends in business.
Hence, based on all the above results, Data mining also proposes a set of actions to be taken.
Characteristics Of A Data Warehouse
A data warehouse is built based on the following characteristics of data as Subject oriented, Integrated, Non-volatile and Time variant.
#1) Subject Oriented: We can define a data warehouse as subject-oriented as we can analyze data with respect to a specific subject area rather than the application of wise data. This provides results that are more defined for easy decision making. With respect to an education system, the subject areas could be students, subjects, marks, teachers, etc.
#2) Integrated: The data in the data warehouse is integrated from distinct sources such as other relational databases, flat files, etc. Such a vast amount of data is fetched for effective data analysis. But there may be data conflicts as different sources of data may be in dissimilar formats. The data warehouse brings all this data in a consistent format across the whole system.
#3) Non-volatile: Once the data is loaded into the data warehouse, it can’t be changed. Logically this is acceptable as frequent change of data will not let you analyze the data. The frequent changes in the operational database can be loaded into a data warehouse on a scheduled basis, during this process, new data gets added, however, the earlier data is not erased and it remains as historical data.
#4) Time-Variant: All the historical data along with the recent data in the Data warehouse play a crucial role to retrieve data of any duration of time. If the business wants any reports, graphs, etc then for comparing it with the previous years and to analyze the trends, all the old data that are 6 months old, 1-year-old or even older data, etc. are required.
Benefits Of A Data Warehouse
When a data warehouse system is productive, an organization gets the following benefits by using it:
- Enhanced Business Intelligence
- Increased System and Query Performance
- Business Intelligence from Multiple Sources
- Timely Access to Data
- Enhanced Data Quality and Consistency
- Historical Intelligence
- High Return on Investment
#1) Enhanced Business Intelligence: In the earlier days when Data Warehousing and Business Intelligence were not in, the business users and analysts used to take the decisions with a limited amount of data and with their own gut feeling.
DW & BI have brought a change by giving insights with real facts and with the real organization data which is gathered over a period of time. Business users can directly query any of the business processes data such as marketing, finance, sales, etc., based on their needs for strategic decision making and smart business decisions.
#2) Increased System And Query Performance: Data warehousing gathers bulky information from heterogeneous systems and places it under one system so that a single query engine can be used for fast data retrieval.
#3) Business Intelligence From Multiple Sources: Do you know how Business Intelligence generally works on data? It absorbs the data from multiple systems, subsystems, platforms and data sources to work on a project. However, Data warehouse resolves this problem for BI, by consolidating all the project data without any duplicates.
#4) Timely Access To Data: Business users will get benefited by spending less time on data retrieval. They have some tools handy, with which they can query the data with minimal technical knowledge and generate the reports. This makes business users spend sufficient time on data analysis rather than data gathering.
#5) Enhanced Data Quality And Consistency: Data warehousing transforms data with dissimilar source system formats into a single format. Hence, maybe the same business units which are sourcing the data to the data warehouse can reuse the DW repository for their business reports and queries.
Hence from the organization’s view, all the business units will stand by with consistent results/reports. Thus this good quality and consistent data help to run a successful business.
#6) Historical Intelligence: Data warehouse maintains all the historical data that are not maintained by any transactional systems. This large amount of data is used to analyze data for specific time duration and to report it, and to analyze the trends to predict the future.
#7) High Return on Investment (ROI): Anyone starts a business by expecting good returns on the investments, in terms of bigger profits and lesser expenses. In the real data world, many studies have proved that implementing the data warehouse and Business Intelligence systems generated high revenues and saved the cost.
By now you should be able to understand how a well-designed DW system adds benefits to your business.
Disadvantages Of Data Warehousing
Though it is a very successful system, it is good to know some of the pitfalls in the system:
- Creating a Data Warehouse is definitely a time-consuming and complex process.
- The maintenance cost is heavy as the system needs continuous upgrades. It might also increase if it is not properly utilized.
- Proper training should be given to the developers, testers, and users to understand the DW system and to implement it technically.
- There may be sensitive data that can’t be loaded into DW for decision making.
- Restructuring of any business processes (or) source systems has a major effect on DW.
Conclusion
We hope this introductory tutorial provided a background of data warehousing fundamentals. We had an in-depth look at all the fundamental concepts of Data Warehousing.
We learned the definition, types, characteristics, benefits and the disadvantages of Data Warehousing in this comprehensive tutorial.
=> Read Through The Easy Data Warehousing Training Series.