2

Using a basic star schema, I have been told that a fact table would have at least the amount of rows equal to the product of the number of rows in each dimension.

For example, 3 products, 5 promotions, and 10 stores would mean that the fact table should have at least 150 records, regardless of where or not a product actually had every promotion or exists in every store. Specifically, null values would exists where for example, a product does not have a specific promotion and etc.

Can someone please provide an academical source that supports, or in the least, please just confirm this idea.

The reason why I am asking this is that my understanding tells me this would create a MASSIVE amount of useless data in the fact table.

Thanks!

Luke
  • 776
  • 9
  • 24

3 Answers3

1

Hi thanks for the replies. I consulted my lecturer and he actually found a page reference for me: "...Take a very simplistic example of 3 products, 5 customers, 30 days, and 10 sales representatives represented as row in the dimension tables. Even in this example, the number of fact table rows will be 4500, very large in comparison with the dimension table rows..." (Ponniah, P., 2009. Data warehousing: Fundamentals for IT professionals, 2nd Edition. John Wiley & Sons, Inc., New Jersey. p. 237)

However, the author goes on to say that: "We have said that a single row in the fact table relates to a particular product, a specific calendar date, a specific customer, and an individual sales representative. In other words, for a particular product, a specific calendar date, a specific customer, and an individual sales representative, there is a corresponding row in the fact table. What happens when the date represents a closed holiday and no orders are received and processed? The fact table rows for such dates will not have values for the measures. Also there could be other combinations of dimension table attributes, values for which the fact table rows will have null measures. Do we need to keep such rows with nulls measures in the fact table? There is no need for this. Therefore it is important to realize this type of sparse data and understand that the fact table could have gaps."

In short, you guys seem to be correct, thanks!

Luke
  • 776
  • 9
  • 24
  • Hopefully your lecturer doesn't believe that every customer purchases every product every day from every sales representative. Fact table rows generally do not exist where there is no fact (such as a sale) to record. – David Aldridge Jun 13 '13 at 21:51
  • @David Oddly enough, I think the example of a Snapshot Fact that Kimball uses in one of his books is almost exactly what you described: a CRM warehouse where a row is populated for every customer every day whether or not the customer completed a transaction. As with so much of what Mr Kimball says, I treat that with a healthy dose of scepticism and caution. – nvogel Jun 13 '13 at 22:22
  • @David, actually, my lecturer did... but i will take it with a pinch of salt. Apparently its value becomes apparent in data mining next semester – Luke Jun 20 '13 at 19:44
0

Of course not. I suggest you ask your source to clarify this claim, it sounds as if there is a missunderstanding somewhere here.

And what if you add a time dimension..? Also it is not even possible to have null values as keys where i.e. promotions are missing, because the reason for the key is to point to a dimensional value, wich a null value isn't doing.

The dimension values are there to support whatever facts you have, not the other way around.

whetstone
  • 635
  • 6
  • 11
0

This may relate to a specific kind of fact table: the pattern that Ralph Kimball terms a Periodic Snapshot Fact Table. That is where the fact table repeats an entire population of rows for each point in time. IMO the usefulness of that approach is extremely limited.

A Snapshot Fact Table does not implicitly require that the fact table is the product of its dimensions but it does pose the potential problem of what the correct population of each snapshot should be. The cross product of dimensions is one way to do it I suppose.

nvogel
  • 24,981
  • 1
  • 44
  • 82