Top 24 Data Modeling Interview Questions for Data Engineers

By Sruthy

Sruthy, with her 10+ years of experience, is a dynamic professional who seamlessly blends her creative soul with technical prowess. With a Technical Degree in Graphics Design and Communications and a Bachelor’s Degree in Electronics and Communication, she brings a unique combination of artistic flair…

Full Bio >

Learn about our editorial policies.

Updated February 22, 2026

Edited by Kamila

Kamila is an AI-based technical expert, author, and trainer with a Master’s degree in CRM. She has over 15 years of work experience in several top-notch IT companies. She has published more than 500 articles on various Software Testing Related Topics, Programming Languages, AI Concepts,…

Full Bio >

Learn about our editorial policies.

Simplify your interview preparation process with these Top Data Modelling Interview Questions. Practice real-world scenarios from here and confidently land your dream job.

Here I am going to share some Data Modeling interview questions and detailed answers based on my experience during interview interactions in a few renowned IT MNCs.

The question-answer can be of great help if you get a chance to face or take an interview on Data Modeling.

Table of Contents:

Expert Quiz on Data Modeling Interview Questions: Test Your Skills Like a Pro
Data Modeling Interview Questions for Freshers
Interview Questions on Data Modeling for Experienced
Final Thoughts on Data Modeling Questions for Job Interview

Expert Quiz on Data Modeling Interview Questions: Test Your Skills Like a Pro

This ultimate quiz on Data Modeling Interview questions will help you to get hired anywhere. Try this expert quiz and crack any Data Modeling interview like a pro.

Data Modeling Interview Questions QUIZ

Master data modeling concepts and boost your interview confidence!

Question 1 of 15

Data Modeling Interview Questions for Freshers

Let’s start!

Q #1) What do you understand by Data Modeling?

Answer: Data Modeling is the diagrammatic representation that shows how entities are related to each other. It is the initial step towards database design. We first create the conceptual model, then the logical model, and finally move to the physical model.

The data models are created in the data analysis & design phase of the software development life cycle.

Q #2) Explain your understanding of different data models.

Answer: There are three types of data models – conceptual, logical, and physical. The level of complexity and detail increases from conceptual to logical to a physical data model.

The conceptual model shows a very basic, high-level design, while the physical data model shows a very detailed view of the design.

Conceptual Model will be just portraying entity names and entity relationships. Figure 1 shown in the later part of this article, depicts a conceptual model.
Logical Model will be showing up entity names, entity relationships, attributes, primary keys, and foreign keys in each entity. Figure 2, shown inside question#4 in this article, depicts a logical model.
Physical Data Model will show primary keys, foreign keys, table names, column names, and column data types. This view actually elaborates on how the model will be implemented in the database.

Q #3) Throw some light on your experience in Data Modeling with respect to projects you have worked on to date.

Note: This was the very first question in one of my Data Modeling interviews. So, before you step into the interview discussion, you should have a very clear picture of how data modeling fits into the assignments you have worked upon.

Answer: I have worked on a project for a health insurance provider company where we have interfaces build in Informatica that transform and process the data fetched from the Facets database and send out useful information to vendors.

Note: Facets is an end-to-end solution to manage all the information for the healthcare industry. My project used SQL Server 2012 to create the facets database.

We had different entities that were linked together. These entities were subscriber, member, healthcare provider, claim, bill, enrollment, group, eligibility, plan/product, commission, capitation, etc.

Below is the conceptual Data Model showing what the project looked like on a high-level

Figure 1:

Each of the data entities has its own data attributes. For example, a data attribute of the provider will be provider identification number, a few data attributes of the membership will be subscriber ID, member ID, one of the data attribute of claim will claim ID, each healthcare product or plan will have a unique product ID, and so on.

Q #4) What are the different design schemas in Data Modeling? Explain with an example.

Answer: There are two different kinds of schemas in data modeling

Star Schema
Snowflake Schema

Now, I will explain each of these schemas one by one.

The simplest of the schemas is the star schema, where we have a fact table in the center that references multiple dimension tables around it. All the dimension tables are connected to the fact table. The primary key in all dimension tables acts as a foreign key in the fact table.

The ER diagram (see Figure 2) of this schema resembles the shape of a star, and that is why this schema is named a star schema.

Figure 2:

The star schema is quite simple, flexible, and it is in de-normalized form.

In a snowflake schema, the level of normalization increases. The fact table here remains the same as in the star schema. However, the dimension tables are normalized. Due to several layers of dimension tables, it looks like a snowflake, and thus it is named as snowflake schema.

Figure 3:

Q #5) Which scheme did you use in your project & why?

Q #6) Which schema is better – star or snowflake?

Answer: (Combined for Q #5&6): The choice of a schema always depends upon the project requirements & scenarios.

Since a star schema is in a de-normalized form, you require fewer joins for a query. The query is simple and runs faster in a star schema. Coming to the snowflake schema, since it is in normalized form, it will require several joins as compared to a star schema. The query will be complex, and execution will be slower than a star schema.

Another significant difference between these two schemas is that the snowflake schema does not contain redundant data, and thus it is easy to maintain. Star schema has a high level of redundancy, and thus it is difficult to maintain.

Now, which one to choose for your project? If the purpose of your project is to do more dimension analysis, you should go for a snowflake schema. For example, if you need to find out “how many subscribers are tied to a particular plan which is currently active?” – go with the snowflake model.

If the purpose of your project is to do more of a metrics analysis, you should go with a star schema. For Example, if you need to find out “what is the claim amount paid to a particular subscriber?” – go with a star schema.

In my project, we used a snowflake schema because we had to do analysis across several dimensions and generate summary reports for the business. Another reason for using a snowflake schema was it has less memory consumption.

Q #7) What do you understand by dimension and attribute?

Answer: Dimensions represent qualitative data. For example, plan, product, and class are all dimensions.

A dimension table contains descriptive or textual attributes. For example, the product category & product name are the attributes of the product dimension.

Q #8) What is a fact & a fact table?

Answer: Facts represent quantitative data.

For example, the net amount due is a fact. A fact table contains numerical data and foreign keys from related dimensional tables. Figure 2, shown above, provides an example of the fact table.

Q #9) What are the different types of dimensions you have come across? Explain each of them in detail with an example.

Answer: There are typically five types of dimensions.

a) Conformed dimensions: A Dimension that is utilized in different areas is called a conformed dimension. It can be used with different fact tables in a single database or across multiple data marts/warehouses.

For example, if the subscriber dimension connects to two fact tables – billing and claim- then we treat the subscriber dimension as a conformed dimension.

b) Junk Dimension: It is a dimension table comprising attributes that don’t have a place in the fact table or in any of the current dimension tables. Generally, these are properties like flags or indicators.

For Example, it can be a member eligibility flag set as ‘Y’ or ‘N’ or any other indicator set as true/false, any specific comments, etc. If we keep all such indicator attributes in the fact table, then its size increases.

We combine all such attributes and place them in a single-dimensional table called a junk dimension, which has unique junk IDs that represent the possible combination of all indicator values.

c) Role-Playing Dimension: These are the dimensions that are utilized for multiple purposes in the same database.

The date dimension can serve as the “Date of Claim,” “Billing date,” or “Plan Term date.” So, such a dimension will be called a role-playing dimension. The primary key of the Date dimension will be associated with multiple foreign keys in the fact table.

d) Slowly Changing Dimension (SCD): These are the most important amongst all the dimensions. These are the dimensions where attribute values vary with time.

Below are the various types of SCDs.

Type-0: These are the dimensions where the attribute value remains steady with time. For example, Subscriber’s DOB is a type-0 SCD because it will always remain the same irrespective of the time.
Type-1: These are the dimensions where the previous value of the attribute is replaced by the current value. No history is maintained in the Type-1 dimension. For example, Subscriber’s address (where the business requires to keep only the current address of subscriber) can be a Type-1 dimension.
Type-2: These are the dimensions where unlimited history is preserved. For example: Subscriber’s address (where the business is required to keep a record of all the previous addresses of the subscriber). In this case, multiple rows for a subscriber will be inserted in the table with his/her different addresses. There will be some column(s) that will identify the current address. For example, ‘Start date’ and ‘End date’. The row where the ‘End date’ value will be blank would contain the subscriber’s current address, and all other rows will have the previous addresses of the subscriber.
Type-3: These are the type of dimensions where limited history is preserved. And we use an additional column to maintain the history. For example, the subscriber’s address (where the business is required to keep a record of the current & just one previous address). In this case, we can dissolve the ‘address’ column into two different columns – ‘current address’ and ‘previous address’. So, instead of having multiple rows, we will be having just one row showing the current and the previous address of the subscriber.
Type-4: In this type of dimension, the historical data is preserved in a separate table. The main dimension table holds only the current data. For example, the main dimension table will have only one row per subscriber holding its current address. All other previous addresses of the subscriber will be kept in a separate history table. This type of dimension is hardly ever used.

e) Degenerated Dimension: A degenerated dimension is a dimension that is not a fact but is present in the fact table as a primary key. It does not have its own dimension table. We can also call it a single attribute dimension table.

But, instead of keeping it separately in a dimension table and putting an additional join, we put this attribute in the fact table directly as a key. Since it does not have its own dimension table, it can never act as a foreign key in the fact table.

Q #10) Give your idea regarding factless fact? And why do we use it?

Answer: Factless fact table is a fact table that contains no fact measure in it. It has only the dimension keys in it.

At times, certain situations may arise in the business where you need to have a factless fact table.

For example, suppose you are maintaining an employee attendance record system, you can have a factless fact table having three keys.

Employee_ID

Department_ID

Time_ID

You can see that the above table does not contain any measures. Now, if you want to answer the question below, you can do so easily using the above single factless fact table rather than having two separate fact tables:

“How many employees of a particular department were present on a particular day?”

So, the factless fact table offers flexibility to the design.

Interview Questions on Data Modeling for Experienced

Q #11) Distinguish between OLTP and OLAP?

Answer: OLTP stands for the Online Transaction Processing System & OLAP stands for the Online Analytical Processing System. OLTP maintains the transactional data of the business & is highly normalized. On the contrary, OLAP is for analysis and reporting purposes & it is in a de-normalized form.

This difference between OLAP and OLTP also gives you a way to choose the design of schema. If your system is OLTP, you should go with a star schema design, and if your system is OLAP, you should go with a snowflake schema.

Q #12) What do you understand by a data mart?

Answer: Data marts are for the most part intended for a solitary branch of business. They are designed for the individual departments.

For example, I used to work for a health insurance provider company that had different departments in it, like Finance, Reporting, Sales, and so forth.

We had a data warehouse that was holding the information about all these departments, and then we have few data marts built on top of this data warehouse. These DataMart were specific to each department. In simple words, you can say that a DataMart is a subset of a data warehouse.

Q #13) What are the different types of measures?

Answer: We have three types of measures, namely

Non-additive measures
Semi- additive measures
Additive measures

Non-additive measures are the ones on top of which no aggregation function can be applied. For example, a ratio or a percentage column; a flag or an indicator column present in fact table holding values like Y/N, etc., is a non-additive measure.

Semi-additive measures are the ones on top of which some (but not all) aggregation functions can be applied. For example, the fee rate or account balance.

Additive measures are the ones on top of which all aggregation functions can be applied. For example, units purchased.

Q # 14) What is a Surrogate key? How is it different from a primary key?

Answer: Surrogate Key is a unique identifier or a system-generated sequence number key that can act as a primary key. It can be a column or a combination of columns. Unlike a primary key, it is not picked up from the existing application data fields.

Q #15) Is this true that all databases should be in 3NF?

Answer: A database doesn’t need to be in 3NF. However, if your purpose is the easy maintenance of data, less redundancy, and efficient access, then you should go with a de-normalized database.

Q #16) Have you ever come across the scenario of recursive relationships? If yes, how did you handle it?

Answer: A recursive relationship occurs where an entity is related to itself. Yes, I have come across such a scenario.

Talking about the healthcare domain, it is a possibility that a healthcare provider (say, a doctor) is a patient to any other healthcare provider. Because if the doctor himself falls ill and needs surgery, he will have to visit another doctor for the surgical treatment.

So, in this case, the entity – health care provider is related to itself. A foreign key to the health insurance provider’s number will have to be present in each member’s (patient) record.

Q #17) List out a few common mistakes encountered during Data Modeling?

Answer: A few common mistakes encountered during Data Modeling are:

Building massive data models: Large data models are likely to have more design faults. Try to restrict your data model to not more than 200 tables.
Lack of purpose: If you do not know what your business solution is intended for, you might come up with an incorrect data model. So, having clarity on the business purpose is very important to come up with the right data model.
Inappropriate use of surrogate keys: Surrogate keys should not be used unnecessarily. Use a surrogate key only when the natural key cannot serve the purpose of a primary key.
Unnecessary de-normalization: Don’t denormalize until and unless you have a solid & clear business reason to do so, because de-normalization creates redundant data which is difficult to maintain.

Q #18) What is the number of child tables that can be created from a single parent table?

Answer: The number of child tables that can be created out of the single parent table is equal to the number of fields/columns in the parent table that are non-keys.

Q #19) Employee health details are hidden from his employer by the health care provider. Which level of data hiding is this? Conceptual, physical or external?

Answer: This is the scenario of an external level of data hiding.

Q #20) What is the form of the fact table & dimension table?

Answer: Generally, the fact table is in normalized form, and the dimension table is in de-normalized form.

Q #21) What particulars would you need to come up with a conceptual model in a health care domain project?

Answer: For a health care project, the following details would suffice the requirement to design a basic conceptual model

Different categories of health care plans and products.
Type of subscription (group or individual).
Set of health care providers.
Claim and billing process overview.

Q #22) Tricky one: If a unique constraint is applied to a column, will it throw an error if you try to insert two nulls into it?

Answer: No, it will not throw any error in this case because a null value is unequal to another null value. So, more than one null will be inserted in the column without any error.

Q #23) Can you quote an example of a sub-type and super-type entity?

Answer: Yes, let’s say we have these different entities – vehicle, car, bike, economy car, family car, sports car.

Here, a vehicle is a super-type entity. Car and bike are its sub-type entities. Furthermore, economy cars, sports cars, and family cars are sub-type entities of their super-type entity- car.

A super-type entity is at a higher level. Sub-type entities are ones that are grouped on the basis of certain characteristics. For example, all bikes are two-wheelers, and all cars are four-wheelers. And since both are vehicles, their super-type entity is ‘vehicle’.

Q #24) What is the significance of metadata?

Answer: Metadata is data about data. It tells you what kind of data is actually stored in the system, what its purpose is, and for whom it is intended.

Final Thoughts on Data Modeling Questions for Job Interview

Practical understanding of the Data Modeling concept and how it fits into the assignments done by you is much needed to crack a data modeling interview.
The most commonly asked topics in Data Modeling interview are – different types of data models, types of schemas, types of dimensions, and normalization.
Be well prepared for scenario-based questions as well.

I would suggest that whenever you are answering a question to the interviewer, you should explain the idea through an example. This would show that you have actually worked into that area and you understand the core of the concept very well.

Was this helpful?

Thanks for your feedback!

READ MORE FROM THIS SERIES:

Expert Quiz on Data Modeling Interview Questions: Test Your Skills Like a Pro

Data Modeling Interview Questions for Freshers

Interview Questions on Data Modeling for Experienced

Final Thoughts on Data Modeling Questions for Job Interview

Was this helpful?

Recommended Reading

Leave a Comment Cancel reply