The right way to model multiple FACTs schema

Question

Background

I'm in a process of designing a database (using a STAR schema).

There are three tables to model: products, tests, states.

The database will be used to store results of tests conducted on products (in a great simplification). There can be many tests pointing out to a single product but each test is unique (they are not shared among products). Besides, I need to record the current state of the product, at the time when the test was conducted. Let's assume that the state of a product describes its current location and owner, which are changing very often. That will most likely involve SCD lvl 2 - to track the history of state changes and to be able to locate a product with all its tests as well as the states it had during these tests.

Problem

I'm not entirely sure how to model this problem. It seems obvious to store every test in a FACT table. This table would then consist of thousands of transactions. On the other hand, there will also be hundreds (and later thousands) of products, so I should probably keep them in a second FACT table. Then, there will also be thousands of state changes, so in order to record their entire history, I would need to keep them in a ... FACT table as well? I've been told that FACT tables are typically used to store multiple-rows data but on the other hand where are the DIMs in this model?

I also don't know how to model the relationships between these tables. Products - states is a 1:* relationship. Products - tests is a 1:* as well. Finally, states - tests is a 1:* too. I would then go with linking products to states and then states to tests (products 1<-* states 1<-* tests), what would allow me to find all states for a particular product and all tests (in all states or in a selected state). What do you think about that? The problem here is that, as I keep adding states, I have two options: either keep duplicating products in the products table (with added "recorded_timestamp" column) or use a SCD lvl 2 in states table, pointing out to the products table with a FK, but this would effectively make the product table a DIM!

Any help here would be very appreciated.

score 0 · Answer 1 · answered Aug 01 '20 at 19:45

Fact tables contain the measures that you wish to report on i.e. numbers you can count, sum , average, etc. They also contain the foreign keys to the dimensions that hold the attributes by which you wish to slice and dice the measures. You need to decide what measures you want to track and their granularity - if all measures have the same granularity then it may be possible to have a single Fact table, otherwise you may need multiple Fact tables.

For dimensions it sounds like you would have Products (probably SCD2 as you need to track product state over time), Tests (assuming that tests have attributes) and whatever other dimensions you need, such as Date or Time

score 0 · Answer 2 · edited Sep 21 '20 at 18:19

It is critically important to understand the difference between a fact and a dimension. A fact contains the measurable aspects of an event. The dimension contains the descriptive possibilities of an event. A dimension contains records even if an event has never happened (some exceptions apply). My favorite example is a menu (dimension) in a new restaurant before the first order (event) is placed.

HOWEVER, another critical understanding is that an event can also be used as a dimension for descriptive aspects of the event. These are generally referred to as degenerates dimensions and are used to associate different aspects of an event, or sub-events. The most common example is an Invoice ID used to tie together all the items purchased in the invoice. The ID is descriptive, not measurable, but does not exist until the Invoice event occurs and has no purpose outside of the event.

A great way to identify your Dimensions and Facts is to ask questions of the model while designing it.

Has a Product passed all the tests?
How many tests have been run on a Product?
Which Tests are required for a new Product?
Have all the required Tests been run on a Product?
How many times was a Test executed for a Product?
What is the pass/fail ratio for Test 123 across all Products?

As you answer the question, identify what you are filtering or grouping on (Dimension Attributes) and which measurements you're needing (Facts), such as number of executions, to answer the questions.

Based on your limited description, I could see Products, Tests and States all being dimensions. I'd then see a TestExecution event, needing the Product and Test dimension and creating the TestExecution ID used as the key for a newly created DimProductTestExecution. The FactProductTestExecution would point to DimProductTestExecution, Product, Test & State dimensions, along with Date and Time dimensions and probably some user dimensions (Who executed the Test? Who approved the test result, etc).

The right way to model multiple FACTs schema

2 Answers2