Oracle Data Warehouse Guide With Benefits, Architecture, Risks, And Comparison with OLTP (Online Transaction Processing) System:
In the previous tutorial of Comprehensive Guide to Oracle, we have learned about Oracle Products and Services in various domains such as applications, databases, OS, etc. This article will provide in-depth knowledge of Oracle Data Warehousing. But before that let us first understand the concept of Business Intelligence (BI).
Business Intelligence
Business Intelligence is a software domain that embodies certain methods, technologies, tools, and applications that helps in structuring, refining, and transforming bulk data into an intelligent and understandable format that can be utilized by customers for generating customized reports and also helps in taking business decisions.
Different options are available to serve this need like Data Warehousing, OLAP (Online Transaction Processing), Data Mining, Data Integration, Decision Engineering, Spreadsheets, etc.
Enterprise Data Warehousing (EDW) is one of the core components of BI that serves the analytical and reporting needs of enterprises. Data Warehouse is a Relational Database Management System (RDBMS), which holds consolidated data received from multiple sources for later use.
Table of Contents:
Oracle Data Warehouse Overview
Why is it termed as ‘Data Warehouse’?
Let’s try to recall the meaning of the word ‘warehouse’ in order to relate to the term ‘Data Warehouse’.
A physical warehouse is a repository that is used for storing goods received from various sources, which can be later on supplied to the customer based on their needs.
[image source]
Similarly, the data warehouse is a repository of data received from various source systems. These sources could be any storage systems like data marts, flat files, or any media storage devices holding data for different enterprise domains like HR, Sales, Operations, Resource Management, and Marketing, etc.
Purpose Of Having A Data Warehouse
An enterprise might have heard about the concept of data warehouse but they can be unsure whether they should include the one to their enterprise. Still, there would always be a need to dump data from different sources on common ground and archive them so that storage space can be released from transaction systems. This is where the Data Warehousing system becomes a business requirement.
To grow in the market, management should be good at decision making which one can only be made after studying the past trends of an organization thoroughly. Hence, this archived data is maintained in the data warehouse in a well-organized and computed format so that it can be referred for business analysis in the future.
Benefits Of Data Warehousing
Data Warehouse if implemented successfully could be beneficial in the following ways:
#1) It has simplified the jobs of analysts by delivering an enhanced version of business intelligence solutions. It extracts data from multiple source systems, transforms, and stores it which can be directly queried by the business for analysis.
It also offers various tools that support the following:
- Generation of customized business reports.
- Interactive dashboards displaying the required information.
- Capability to drill down through dashboards only to get the details.
- Data Mining & Trend Analysis.
#2) Even after receiving data from various source systems, data within a data warehouse stays consistent as a result of transformations that occurred during the ETL process. Consistent data gives confidence to a decision-maker in terms of accuracy.
#3) Data warehouses are also defined as a time saver as critical data required by stakeholders for making business decisions is available at a single location and can be retrieved easily.
#4) These are designed to hold historical data and hence can be queried to study trends during different time periods. It also helps stakeholders to derive the future growth path.
Risks Involved In Using Data Warehouse
Along with benefits, every new implementation involves a set of risks too that needs to be taken care of.
Enlisted below are some of the risk involved:
- Non-compatibility of source systems with the data warehousing system may end up doing a lot of manual work.
- Incorrect time estimation of the ETL process may lead to interrupted work.
- These are very high-end storage systems and thus need high maintenance. Any workflow or business changes may cost very high.
- Setting up a data warehouse is time taking process as it needs a lot of time to understand business flows and identify integration capabilities to design a warehouse.
- Data security is always a risk here as it holds age-old historic data which if leaked, can impact the business.
Comparison Of OLTP Vs Data Warehouse
The differences between OLTP and Data Warehouse can be understood from the below table.
OLTP | Data Warehousing |
---|---|
OLTP systems supports business transactions. | Data Warehouse supports business decisions taken after analyzing completed business transactions. |
Data stays volatile i.e. keeps changing | Data is not supposed to be changed. |
They holds the most recent data. | They holds the historical data. |
Holds the raw data without any calculations. | Holds summarized and well computed data. |
Data will be normalized. | Data will stay de-normalized. |
Size of Oracle database could vary from 50MB to 100GB. | Size of Oracle database could vary from 100GB to 2TB. |
Insert & Updates are the major operations performed by end users on OLTP systems. | Data Warehouses are majorly queried using SELECT statement and can only be updated using ETL services. |
Contrasting Data Warehouse And Data Mart
Data Warehouse and DataMart, aren’t both the terms sound similar and seem related to data storage.
Yes, they are related and both of them are used for storing data. The main difference between both of them is the capacity to hold the data and this difference helps end-users to choose the right storage unit for their systems.
Data Mart has less capacity for holding data as compared to the data warehouse and hence it can be considered as a subset of it. Data marts are usually identified to store limited data that could be of a particular department or line of business while data warehouses can be used to hold the consolidated data for all.
Let us take an example of an e-commerce website having various categories for merchandise like Fashion, Accessories, Household Items, Books and School Supplies, Electronics Appliances, etc.
So, Data marts can be designed to store the product data category wise while data warehouses can be used to store complete website data including history in one place.
Data marts are smaller in size, they can be created much faster without much analysis as is required for designing a data warehouse. However, it takes much effort to keep several data marts in sync so as to maintain data consistency.
Overview Of The ETL Process
ETL (Extraction, Transformation, and Loading) is a process of extracting data from different source systems, transforming and loading it to the Data Warehouse system. It is a complex process that needs to interact with a variety of source systems for data extraction and hence technically challenging as well.
Transformation again needs a lot of analysis to understand the format of source systems and bring data to the common format so that the same data can be stored in the data warehouse.
The ETL process is a recurring job that can run daily, weekly, or even monthly depending upon the business requirement.
Data Warehouse Architecture
Let’s understand the architecture of a Data Warehouse which is mainly designed to store refined data for predefined business requirements. The architecture consists of 5 components with data flow from top to bottom.
The components are as follows:
- Data Sources
- Data Staging
- Data Warehouse(Data Storage)
- Data Marts(Data Storage)
- Data Presentation
Let us understand all the stages enlisted above one by one.
#1) Data Sources
There are different source systems which act as an input to data warehouse systems.
These source systems can be:
- Relational databases like Oracle, DB2, MySQL, MS Access, etc. which can be used to record daily transactions of any organization. These daily business transactions could be related to ERP, CRM, Sales, Finance, and Marketing, etc.
- Flat files
- Web services
- RSS feeds and similar sources.
#2) Data Staging
Once the data sources are in place, the next step would be to extract this data from the source systems into the warehouse staging area.
As data has been retrieved from different systems that follow different storage formats, it’s required to restructure the data so as to bring it to a common format. Hence, data transformation takes place as a next step.
During transformation, data cleansing occurs which includes applying business rules, filtering data, redundancy removal, data formatting, data sorting, etc.
#3) Data Warehouse (Data Storage)
Once the data is extracted and transformed, it will be loaded into a multi-dimensional environment i.e. Data Warehouse. Now, this processed data can be used for analysis and other purposes by end-users.
#4) Data Marts (Data Storage)
As mentioned above that data is now ready to be consumed by end-users, there is an optional process of creating Data Marts as a next step. These data marts can be used to store summarized data of a particular department or a line of business for dedicated use.
For Example, separate data marts can be added for departments like Sales, Finance, and Marketing, etc. as a next step that will hold specific data and allows an analyst to perform detailed queries for business needs. It also prevents every other end-user from accessing the complete warehouse and hence makes the data secure.
#5) Data Access Tools(Data Presentation)
There are a number of predefined Business Intelligence tools that can be used by users to access data warehouses or data marts. These front-end tools are designed in an extremely user-friendly way by giving users a variety of options to access data.
The options are mentioned below:
- By applying the query to Oracle or any other databases directly through SQL.
- Report generation.
- Developing Application.
- Using Data Mining tools etc.
Few popular warehousing tools available in the market are:
- Analytix DS
- Amazon Redshift
- Ab Initio Software
- Code Futures
- Holistic Data Management
- Informatics Corporation
Cloud Data Warehousing
Data warehouses are excessively recognized by the world. The next question that arises: Are we using an optimized approach to deploy Data Warehouses?
Then Cloud Data Warehousing was introduced which take an upper hand on Enterprise Data Warehousing (EDW). The concept of Cloud-based Data Warehouses has offered various advantages.
These are as follows:
(i) Scalability: Data on cloud systems are easily scalable up and down with no hassles while it consumes a lot of time and resources to perform scaling on traditional data warehouses.
(ii) Cost Saving: Cloud-based data warehouses have made a remarkable difference to the investment required for a warehouse setup. They have reduced the bulk upfront cost by eliminating the cost of
-
- Maintaining hardware/server rooms.
- Staff required for maintenance.
- Other operational costs.
(iii) Performance: Performance is another factor that allowed cloud-based systems to dominate over traditional ones. If the business is expanded globally and data needs to be accessed from various parts of the world with quicker turnaround, cloud-based warehouses are best to use.
Massively Parallel Processing (MPP) is one of the collaborative processing methods used by warehouses to achieve the same.
(iv) Connectivity: As mentioned above, if data needs to be accessed from multiple geographical locations, users need excellent connectivity to these warehouses and a cloud-based warehouse offers the same.
Conclusion
We hope you all have got a fair idea about the Oracle Data Warehousing system after reading the above article. Let us know if you need any insights on a particular topic around data warehousing so that we can cover up the same in upcoming tutorials.