0

Imagine that we have two different messages: CarDataLog CarStatusLog

CarDataLog contains data which has a direct relation to a car and the corresponding Person and contains data about the car.

CarStatusLog contains data about the same car as mentioned above which had a customer in the log included. But this time the data is a status. For a field like: "CleaningState": "NotCleaned" or "Cleaned".

Both of the log messages contain a Car_ID. Would we create one Fact table with the foreign keys to Car and Person and have the risk the person_id is null sometimes because it is not given.. Or would a better approach be to create two fact tables with the risk of having the 'grain' spreaded out?

The use case would be: get data for a specific car, including the states it had and the Person first name.

I am new to data warehousing and I hope someone can assist me with this issue?

Rvh
  • 43
  • 6

2 Answers2

2

A standard practice in data warehousing is to make a dummy row for dimension tables that is used to match "UNKNOWN" data. This prevents NULLS in the foreign keys in the fact table.

Depending on your use case, you may have multiple types of "UNKNOWN" data. For example, you could use a key of -1 for "UNKNOWN" and -2 for "NOT APPLICABLE" dimensional data.

See also: https://www.kimballgroup.com/2010/10/design-tip-128-selecting-default-values-for-nulls/

saritonin
  • 247
  • 2
  • 18
1

You need dims as Car_dim, Person_dim, Status_dim (as values CleaningState,NotCleaned" or "Cleaned), and Date_dim. Person_dim can have a row of "Unknown" person name when you get a null person name.

Dim and Fact tables have parent/child relationship that means you have to load data in Dim first (Dim is a parent) and then you load into a Fact (child) table.

Load dim IDs from above Dims in your Fact table based on the data you get. Make sure the 2 logs you have date fields in them so you can join both logs on a Car_id and when a date in both logs matches for that Car_id. If you get a scenario when a Car_id exists in CarDataLog but not in CarStatusLog, then you need to create a row of "Unknown Status" in the Status_dim so you can use it in the Fact table. Good Luck!

Sam Kaz
  • 7
  • 1