This Tutorial Explains Data Mart Concepts Including Data Mart Implementation, Types, Structure as Well as Differences Between Data Warehouse Vs Data Mart:
This tutorial will help you to learn Data Mart concepts in detail along with simple examples.
We will see What is a data mart? When do we need a data mart? Cost-effective data marting, Cost of a data mart, Types of data marts, Steps in implementing a data mart, the Structure of a data mart, When is a pilot Data Mart useful? Datamart drawbacks and the differences between Data Warehouse vs Data Mart.
- Data warehouse/ETL developers and testers.
- Database professionals with basic knowledge of database concepts.
- Database administrators/Big data experts who want to understand Data warehouse/ETL concepts.
- College graduates/Freshers who are looking for Data warehouse jobs.
What You Will Learn:
- What Is A Data Mart?
- Comparison Of Data Warehouse Vs Data Mart
- Types Of Data Marts
- Implementation Steps Of A Data Mart
- Structure Of A Data Mart
- When Is A Pilot Data Mart Useful?
- Drawbacks Of Data Mart
What Is A Data Mart?
A data mart is a small portion of the data warehouse that is mainly related to a particular business domain as marketing (or) sales etc.
The data stored in the DW system is huge hence data marts are designed with a subset of data that belongs to individual departments. Thus a specific group of users can easily utilize this data for their analysis.
Unlike a data warehouse that has many combinations of users, each data mart will have a particular set of end-users. The lesser number of end-users results in better response time.
Data marts are also accessible to business intelligence (BI) tools. Data marts do not contain duplicated (or) unused data. They do get updated at regular intervals. They are subject-oriented and flexible databases. Each team has the right to develop and maintain its data marts without modifying data warehouse (or) other data mart's data.
A data mart is more suitable for small businesses as it costs very less than a data warehouse system. The time required to build a data mart is also lesser than the time required for building a data warehouse.
Pictorial representation of Multiple Data Marts:
When Do We Need Data Mart?
Based on the necessity, plan and design a data mart for your department by engaging the stakeholders because the operational cost of data mart may be high some times.
Consider the below reasons to build a data mart:
- If you want to partition the data with a set of user access control strategy.
- If a particular department wants to see the query results much faster instead of scanning huge DW data.
- If a department wants data to be built on other hardware (or) software platforms.
- If a department wants data to be designed in a manner that is suitable for its tools.
Cost-Effective Data Mart
A cost-effective data mart can be built by the following steps:
- Identify The Functional Splits: Divide the organization data into each data mart (departmental) specific data to meet its requirement, without any further organizational dependency.
- Identify User Access Tool Requirements: There may be different user access tools in the market that need different data structures. Data marts are used to support all these internal structures without disturbing the DW data. One data mart can be associated with one tool as per the user needs. Data marts can also provide updated data to such tools daily.
- Identify Access Control Issues: If different data segments in a DW system need privacy and should be accessed by a set of authorized users then all such data can be moved into data marts.
Cost Of Data Mart
The cost of data mart can be estimated as follows:
- Hardware And Software Cost: Any newly added data mart may need extra hardware, software, processing power, network, and disk storage space to work on queries requested by the end-users. This makes data marting an expensive strategy. Hence the budget should be planned precisely.
- Network Access: If the location of the data mart is different from that of the data warehouse, then all the data should be transferred with the data mart loading process. Thus a network should be provided to transfer huge volumes of data which may be expensive.
- Time Window Constraints: The time taken for the data mart loading process will depend on various factors such as complexity & volumes of data, network capacity, data transfer mechanisms, etc.
Comparison Of Data Warehouse Vs Data Mart
|S.No||Data Warehouse||Data Mart|
|1||Complex and costs more to implement.||Simple and cheaper to implement.|
|2||Works at the organization level for the entire business.||The scope is limited to a particular department.|
|3||Querying the DW is difficult for business users because of huge data dependencies.||Querying the data mart is easy for business users because of limited data.|
|4||Implementation time is more may be in months or years.||Implementation time is less may be in days, weeks or months.|
|5||Gathers data from various external source systems.||Gathers data from a few centralized DW (or) internal (or) external source systems.|
|6||Strategic decisions can be made.||Business decisions can be made.|
Types Of Data Marts
Data marts are classified into three types i.e. Dependent, Independent and Hybrid. This classification is based on how they have been populated i.e. either from a data warehouse (or) from any other data sources.
Extraction, Transformation, and Transportation (ETT) is the process that is used to populate data mart's data from any source systems.
Let's take a look at each type in detail!!
#1) Dependent Data Mart
In a dependent data mart, data is sourced from the existing data warehouse itself. This is a top-down approach because the portion of restructured data into the data mart is extracted from the centralized data warehouse.
A data mart can use DW data either logically or physically as shown below:
- Logical View: In this scenario, data mart's data is not physically separated from the DW. It refers to DW data through virtual views (or) tables logically.
- Physical subset: In this scenario, data mart's data is physically separated from the DW.
Once one or more data marts are developed, you can allow the users to access only the data marts (or) to access both Data marts and Data warehouses.
ETT is a simplified process in the case of dependent data marts because the usable data is already existing in the centralized DW. The accurate set of summarized data should be just moved to the respective data marts.
An Image of Dependent Data Mart is shown below:
#2) Independent Data Mart
An independent data mart is best suitable for small departments in an organization. Here data is not sourced from the existing data warehouse. The Independent data mart is neither dependent on enterprise DW nor other data marts.
Independent data marts are stand-alone systems where data is extracted, transformed and loaded from external (or) internal data sources. These are easy to design and maintain until it is supporting simple department wise business needs.
You have to work with each phase of the ETT process in case of independent data marts in a similar way as to how the data has been processed into centralized DW. However, the number of sources and the data populated to the data marts may be less.
Pictorial representation of an Independent Data Mart:
#3) Hybrid Data Mart
In a hybrid data mart, data is integrated from both the DW and other operational systems. Hybrid data marts are flexible with large storage structures. It can also refer to other data marts data.
Pictorial representation of a Hybrid Data Mart:
Implementation Steps Of A Data Mart
The implementation of Data Mart which is considered to be a bit complex is explained in the below steps:
- Designing: Since the time business users request a data mart, the designing phase involves requirements gathering, creating appropriate data from respective data sources, creating the logical and physical data structures and ER diagrams.
- Constructing: The team will design all tables, views, indexes, etc., in the data mart system.
- Populating: Data will be extracted, transformed and loaded into data mart along with metadata.
- Accessing: Data Mart data is available to be accessed by the end-users. They can query the data for their analysis and reports.
- Managing: This involves various managerial tasks such as user access controls, data mart performance fine-tuning, maintaining existing data marts and creating data mart recovery scenarios in case the system fails.
Structure Of A Data Mart
The structure of each data mart is created as per the requirement. Data Mart structures are called Star joins. This structure will differ from one data mart to another.
Star joins are multi-dimensional structures that are formed with fact and dimension tables to support large amounts of data. Star join will have a fact table in the center surrounded by the dimension tables.
Respective fact table data is associated with dimension tables data with a foreign key reference. A fact table can be surrounded by 20-30 dimension tables.
Similar to the DW system, in star joins as well, the fact tables contain only numerical data and the respective textual data can be described in dimension tables. This structure resembles a star schema in DW.
Pictorial representation of a Star Join Structure.
But the granular data from the centralized DW is the base for any data mart's data. Many calculations will be performed on the normalized DW data to transform it into multidimensional data marts data which is stored in the form of cubes.
This works similarly as to how the data from legacy source systems is transformed into a normalized DW data.
When Is A Pilot Data Mart Useful?
A pilot can be deployed in a small environment with a restricted number of users to ensure if the deployment is successful before the full-fledged deployment. However, this is not essential all the time. The pilot deployments will be of no use once the purpose is met.
You need to consider the below scenarios that recommend for the pilot deployment:
- If the end-users are new to the Data warehouse system.
- If the end-users want to feel comfortable to retrieve data/reports by themselves before going to production.
- If the end-users want hands-on with the latest tools (or) technologies.
- If the management wants to see the benefits as a proof of concept before making it as a big release.
- If the team wants to if ensure all ETL components (or) infrastructure components work well before the release.
Drawbacks Of Data Mart
Though data marts have some benefits over DW they also have some drawbacks as explained below:
- Unwanted data marts that have been created are tough to maintain.
- Data marts are meant for small business needs. Increasing the size of data marts will decrease its performance.
- If you are creating more number of data marts then the management should properly take care of their versioning, security, and performance.
- Data marts may contain historical (or) summarized (or) detailed data. However, updates to DW data and data mart data may not happen at the same time due to data inconsistency issues.
Many organizations are gearing towards data marts from a cost-saving perspective. Hence this tutorial has focused on the technical aspects of data marts in the data warehouse system.
Metadata in ETL is explained in detail in our upcoming tutorial.