This Data Modeling Tutorial explains what is Data Modeling, types of Data Models & comparison of conceptual vs logical vs physical data models:
In today’s competitive digital world, data is an essential business asset that represents information based on which decisions are made to lead in the market. To process such data in the database, a data model is created, and this process is known as Data Modeling.
In this tutorial, we will understand the concept of Data Modeling with its types. We will also see the 5 different tools and techniques for data modeling, i.e. Hierarchical, Network, Object Oriented, Entity Relationship, and Relational.
Understanding Data Modeling
It is the process of creating a model to represent the way the data of the application is going to be stored.
Further, we will also see some steps to improve and enhance data models in order to completely understand the business need and client expectations.
Significance Of Data Management In Data Model
The significance of effective data management for an efficient and successful business is enormous. Data Modeling lays the foundation for data management.
Data Modeling salient features are:
- It lays the foundation for data governance and management.
- It is the process of creating a model to represent the way the data of the application is going to be stored.
- It is neither the actual data nor the database that is to be used in the system.
- It represents the data entities, the relationship between different data entities, and the business rules, policies, etc. that govern the data and their activities.
- It helps in standardization mainly in naming the entities, setting default values, their semantics, constraints, and security, etc.
- The normalization process ensures redundant data are removed and the relationship and dependencies established are as per business.
Organization data scattered view:
Data Modeling basic view:
The key parameters that contribute to good data modeling are:
- Clarity
- Consistency
- Accuracy
- Ease of access
- Flexibility
- Performance
- Productivity
- Traceability.
Data Model is like a proposed design presented by an architect of your home.
The layout has a view of all rooms presented with the proposed layout of electricity, plumbing, and other important points. Its objective is to make the client see in the visual mode that all his requirements have been met and how the end result is going to be. Also to see if anything needs to be changed, removed, or propose an alternative.
Data Model: Goal And Significance
The main goal and significance of data modeling to business are as below:
- To facilitate the requirement discussion by presenting a view that makes it easy for all non-IT and less technical stakeholders to understand the details of the data to be stored.
- Prevention of data omission and helps in identifying redundant data.
- Improve the business process by understanding and suggesting business rules and constraints that can be implemented proactively. These observations are generally reported by the client post-implementation.
- This collaborative approach reduces the implementation time as it reduces the number of changes required to be done to code post-development.
- This process encourages collaboration between departments, the IT team, and business stakeholders.
- The inputs from business and technical people make the database design more robust, flexible, and efficient.
- Facilitates the developer to design the actual database and store data properly defining the relationship, keys, and constraints properly
What Is Data Modeling
- A data model is an abstract view of the data referred to in the application being developed.
- The data model view collates all the data elements referred to by the business.
- It presents the collated data in an organized way.
- Facilitates the development of the system.
- Each data represents a real-world business entity.
- The view depicts the relationship between data entities in the real world.
- It presents the basic properties and rules if any, that guide the data element.
For example, an application for a car manufacturer
Entity 1 - Car | ---Relationship ---->> (specific car sold i.e., the car id) | Entity 2 - Customer | ||
---|---|---|---|---|
Attribute | Attribute Property & Rules | Attribute | Attribute Property & Rules | |
Color | String | Name | String | |
Engine | String | Address | Cannot be numeric | |
Car Model | String | |||
Purchase date | dd/mm/yyyy Rule – Purchase date cannot be future dated |
In a different application, say a Tax management application, the entity ‘Assets’ can have a data car. Car is an entity in application 1 and is an actual data content in application 2.
To summarize, the attribute and rules governing a data entity will differ from application to application depending on the requirement of the application.
Data Model is a framework
- That helps in building the information system
- Facilitates easy access to the business data
- Helps management to take decisions
- Facilitate business to function efficiently
- Presents a view for
- All the stakeholders to get a clear and easy understanding of what information is going to be stored
- The core rules and policies that will govern each entity
Data Model does not store the data but is just a representation of how data will be managed during development.
Types Of Data Models
These are of 3 types:
- Conceptual Data Model: It gives a view of the business significance of each data entity and not a technical detail of data.
- Logical Data Model: It gives a detailed description of each data entity their attributes and the relationship between two entities giving business purpose to each data.
- Physical Data Model: It gives a technical view of the data i.e., the table name, column name, datatype, constraints, indexes, primary key, triggers, stored procedures, etc.
#1) Conceptual Data Model
Key features:
- Represents the business data as seen in the real world and as understood by businesses.
- It facilitates an easy understanding of the business data, concepts, and rules.
- The contributors are generally Business stakeholders and Data Architects.
- It makes communication easy and clear as all stakeholders refer to specific entities with the same nomenclature and thus reducing communication gaps.
- The main three components or elements of the Conceptual Data Model are
- Entity
- Attributes
- Relationship.
- This view, in a simple way, presents all the data entities referred to by the business and their characteristics called attributes and the dependency or the connection between entities called Relationships.
- In this data modeling, there are no details of the actual database or actual data that will be stored.
Extending the above example as explained in the image “Data Modeling basic view”,
The two business entities are Customer and Product. The attributes of the two entities are detailed in the Entity box.
The relationship between the two entities is the transaction -> ‘Sale’.
The relationship is many to many as one customer can have multiple products and one product can have multiple customers.
Conceptual Model Diagram:
Some commonly used Relationship notations are:
- One to one: Employee and his joining details
- One to many: Employee and his workplace
- One and only one: Employee and his date of birth
- Zero or one: Employee and his place of demise
- Zero or many: Employee and his subordinates
Examples of cardinality: Relationship Notation
#2) Logical Data Model
Key features:
- A logical data model is an extension of a conceptual model with more details.
- It is prepared by a data architect with inputs from the business.
- Logical data models give a detailed structure of the data elements and the data entity relationships in a system from the business point of view.
- This model is generally represented using UML notation.
- It is independent of technology and technology changes do not impact the model.
- It’s the blueprint that guides and helps in navigating the rapid changes in technology and customer needs.
- The logical model lays the foundation for the physical database
- The data attributes represented typically are the data type and their relationship with other entities.
- In this presentation, all attributes of an entity need to be mentioned without any omission.
- The relationship is not generic as presented in the conceptual data model
- The relationship is presented more explicitly with details like,
- Primary key
- Foreign key
- Parent-child dependent entity type
- Non-identifying relation
- The cardinality of the relationship.
Extending the above example as explained in the image “Data Modeling basic view”,
The business entities are Customer, Product, and Sale. All the attributes of the three entities are detailed in the Entity box. The relationship between the entities is more specific. The sale is the child entity of the Customer and Product Entity.
The relationship is many to many as a customer and a product can have multiple sales and a sale can have multiple customers and products. But every sale record should have an existing product and an existing customer. So, Sales is a Child entity, and Product and Customer are Parent Entities.
Logical Model Diagram:
#3) Physical Data Model
Key features:
- A physical data model is the layout of the actual database with all its components and services.
- This model depicts the actual database where the actual data is stored and is accessed in the production environment when users use the application.
- This model is for developers and other users who desire to query the actual data
- It is based on technology and the choice that has been made for the database and the version of the database i.e., ORACLE, SQL, INGRES, etc.
- It is an extension of the Logical model and has more details like table names for the Entity. Column names are the attributes with more specifications like their exact datatype, length, and precision.
- It may display other attributes like primary keys, indexes, security roles, constraints, stored procedures, views, etc.
- In addition, it may also have derived data, i.e., data generated by a process in the application that needs to be stored for future reference.
Physical Model Diagram:
Extending the above example as explained in the image “Data Modeling basic view”,
The business entities Customer, Product, Sale. All their attributes are detailed in the Entity box. Each attribute’s data type and length and precision also can be mentioned.
The relationship between the entities is more specific. The individual entities are also clear with their primary keys and foreign keys. The diagram can also present stored procedures and rules and constraints.
Conceptual Vs Logical Vs Physical Data Model
Model | Conceptual | Logical | Physical |
---|---|---|---|
Owner / Creator | Business Stakeholders & Requirement Team members | Data Architects & Business Analysts | Database Administrator & Developers |
Purpose | Facilitates communication between the technical and non-technical stakeholders in capturing all the information / data, their rules and processes | Facilitates understanding of business entities, business flow, rules and helps in creation of actual physical database | Facilitates implementation of the actual entities with actual business data and to be used by the application in the production environment |
Complexity | Simple | Complex | Very Complex |
Entity Names & Relationship | Yes, presented | Yes, presented | Not presented |
Entity Attributes | Not presented | Yes, presented | Not presented |
Keys (Primary & Foreign) | Not presented | Yes, presented | Yes, presented |
Table & Column Names | Not presented | Not presented | Yes, presented |
Column Data Type & Constraints | Not presented | Not presented | Yes, presented |
Naming convention restrictions | Use Business names with no restrictions | Use Business names with no restrictions | Naming conventions and the length governed by the DBMS |
Data Elements | All business data elements referred in the business | All business data elements that will be referred by the system | In addition to data elements of Logical model will have flags, time stamp and other data elements referred by the application from technical point of view and also some derived data generated by the application |
Data Modeling Techniques And Tools
Data Models are like documents that get updated with changing business requirements or enhancements. This is referred to initially by stakeholders, developers, business analysts, etc. These documents can be shared with others like vendors, partners, etc.
The need to convey the same message to all the readers, and standards needs to be maintained. Data modeling has some formal standardized format, techniques, and schemas like flowcharts. This way there is a common understanding across all, within and outside the organization.
There are 5 types of Data Modelling Techniques
#1) Hierarchical
- This model was one of the earliest and can represent simple data models.
- This model could handle one-to-many and one-to-one relations only.
- It has a root or a parent node and then follows a tree-like structure with other child nodes.
- Only one parent is allowed for a child node, and a parent node can have multiple child nodes.
- The navigation path to a child node is always through a parent node, so is slow and difficult.
- If a parent is deleted, then all child nodes get deleted. It is very easy and simple to understand.
- Complex data models cannot be represented using this technique.
- Data integrity is maintained in the parent-child relationship i.e., any change in the Parent is reflected in the child entity.
- In the below chart, to reach a Professor’s data one has to navigate through the College head and then to the department head.
#2) Network
- This model was the next one to be evolved from the hierarchical model.
- Here, a child node could have multiple parent nodes.
- This model could handle many-to-many relations in addition to one-to-one and one-to-many.
- Navigation is faster in this model, as there are multiple paths to reach a child entity.
- This model also gets very complex when data is large.
- The data update is more complex than the hierarchical model.
#3) Object-Oriented
- Object-oriented databases gained popularity with object-oriented programming.
- A real-world entity is represented as an object and the object’s properties are the real-world entities’ attributes.
- The model presents a view as a collection of objects.
- Each object has its methods and features.
#4) Entity Relationship
- This model represents the system at a very high level.
- Real-world problems can be represented through this model.
- It is very easy for all technical and non-technical stakeholders to understand.
- It is referred by database architects and developers to build the physical database.
- The three main elements are:
- Entity
- Its attributes
- And relationship with other entities
- Several ER modeling tools are available in the market.
- In the diagram below,
- The two entities are Professor and Subject.
- Some of the attributes of these entities are also depicted in the diagram.
- The relationship between the two entities is through ‘Teaching’.
#5) Relational
- The relational Model is the most popular model.
- In this model, the Entity information is represented as a table.
- All the attributes are represented as columns.
- The relationship between entities is represented by different standard notations.
- The primary key and foreign key of each entity are also mentioned.
Some Data Modeling Tools are:
- MySQL Workbench
- ER/Studio
- erwin Data Modeler
- SQuirreL SQL Client
- Draw.io
- Lucidchart
- Amundsen
- Postico
- Navicat
- Datagrip
Recommended Reading =>> 10 Best Data Modeling Tools To Manage Complex Designs
Enhance And Improve Data Models
Steps to enhance and improve data models are as follows:
- Get a clear understanding and clarity of the business needs and client expectations.
- Present the information gathered in detail in visual mode with one of the data modeling techniques.
- Keep enhancing and updating the visual presentation as the discussions progress.
- Present the information or data that is needed in the application.
- Collate and keep the list of the information that is being ignored as it is redundant for the application.
- If there are data that is being derived or calculated, then all this needs to be presented in the model. This will help to get all the stakeholders on the same page.
- Present all the correlations between data/entity and also try to present any causal analysis, if observed i.e., the impact of one transaction on the other, etc.
- Timely and frequent review of the model presentation should be part of a regular process
Frequently Asked Questions
Q #1) What are Normalization and de-Normalization done for?
Answer: Normalization is done to remove redundant data and data which is not referred to in the application. Its main objective is to reduce complexity, ensure the correctness of the relationship, and all data are grouped and collated correctly.
De-Normalization is the process of adding redundant data to a normalized model for better reading and documentation purposes but this has an impact on storage and performance.
Q #2) What are the issues faced during Data Modeling?
Answer: The common issue encountered during data modeling are – Overloading Data model. If the number of Tables increases beyond say 200 the performance gets impacted and complexity also increases, resulting in failures and issues.
The model is just a representation and has no relation to the company’s work. The company’s goal, processes, and key focus areas are not understood and captured. The application built on this basis will not yield an impactful result. Normalization and De-normalization should be done only if there is a strong reason.
Q #3) What are the Primary key and Foreign key?
Answer: Primary key is a column or group of columns that has a unique value for every row in the table. In a table containing students’ data, each row represents a specific student’s data. So, the primary key i.e., a unique key (student-id) should identify each student. A foreign key is a column or a group of columns that link two tables.
A child table’s foreign key will be linked to the parent table’s primary key. A student’s table can be linked to the result table by linking the student-id of the result table to a primary key in the student’s table. This will help in displaying the student profile from the student’s table and the result from the result table of a specific student.
Q #4) Give different constraints with examples?
Answer: Listed below few constraints
- Check value & Not null example: ROLL_NO INT NOT NULL CHECK (ROLL_NO >1000)
- Default value example: EXAM_FEE INT DEFAULT 10000
- Uniqueness example: STU_ADDRESS VARCHAR (35) UNIQUE
Q #5) Name some DBMS software
Answer: Listed below are some DBMS software:
- Microsoft Access
- FoxPro
- dbase
- MySQL
- Oracle
- IBM DB2
- Microsoft SQL Server.
Conclusion
The data model is an abstract view of presenting real-world business data, its properties, and the relationship between data elements in an easy-to-understand way. The objective is to facilitate the understanding of how the data is going to be managed for all technical as well as non-technical people involved.
The models are not the actual business data but a step towards storing the data in the database.
The main three types of data models are the Conceptual Model, Logical Model, and Physical Model. The commonly used Data modeling techniques are Hierarchical, Network, Object-oriented, Relational, and Entity-Relationship. The techniques help in standardizing the way the data and entities are represented.
There are many tools available to draw data models like MySQL Workbench, ER/Studio, Erwin Data Modeler, etc.
Data Modeling lays the foundation for good data management as “The goal is to turn data into information, and information into insight.”
– By Carly Fiorina (ex CEO of Hewlett-Packard).