Sould I split up fact table by different source queries? Datawarehouse best practice

Question

I think it's time for my first question on stackoverflow. Until now i've always found an answer to my questions, but now I have a very specific question for fact table design. Right now i have one fact table and the question is, if i should split it up into three fact tables. Currently the table looks something like this:

d_date | d_customer | d_filter | d_x | d_y | metric 1 | metric 2 | metric 3

The data for my database is returned from the source system with three different queries:

query 1 returns data with dimensions date, customer and filter. This one returns the totals per customer.
query 2 returns data with date, customer, filter and d_x.
query 3 returns data with date, customer, filter, d_x and d_y.

Specialty: The aggregates for query 2 and 3 are different to each other and also different to query 1. But that is fine, the source system just works that way.

Because I don't need all dimensions for all datasets I created dummy rows in the dimension. So for query 2 there is a dummy row in d_y and for query 1 there is a dummy row in d_x and d_y.

But as queries get slow I started reading books about best practices and red that you should split up fact tables by granularity. Maybe there is no real best practice for this problem because the source system is very strange, but one option must be better than the other in this case. Keep one fact table or split it up into three fact tables.

I hope you can understand the problem.

score 1 · Answer 1 · answered Mar 05 '16 at 23:29

Broadly without knowing more detail, I would say yes, you should have 3 fact tables.

A fact table should as a general principle, have data at the same grain and you are artificially creating the same grain for some fact records by adding dummy dimension values.

If you want/need to report these together, I would create them as three separate fact tables, then create a view(s) over the fact tables that does the combining.

Sould I split up fact table by different source queries? Datawarehouse best practice

1 Answers1