I think it's time for my first question on stackoverflow. Until now i've always found an answer to my questions, but now I have a very specific question for fact table design. Right now i have one fact table and the question is, if i should split it up into three fact tables. Currently the table looks something like this:
d_date | d_customer | d_filter | d_x | d_y | metric 1 | metric 2 | metric 3
The data for my database is returned from the source system with three different queries:
- query 1 returns data with dimensions date, customer and filter. This one returns the totals per customer.
- query 2 returns data with date, customer, filter and
d_x
. - query 3 returns data with date, customer, filter,
d_x
andd_y
.
Specialty: The aggregates for query 2 and 3 are different to each other and also different to query 1. But that is fine, the source system just works that way.
Because I don't need all dimensions for all datasets I created dummy rows in the dimension. So for query 2 there is a dummy row in d_y
and for query 1 there is a dummy row in d_x
and d_y
.
But as queries get slow I started reading books about best practices and red that you should split up fact tables by granularity. Maybe there is no real best practice for this problem because the source system is very strange, but one option must be better than the other in this case. Keep one fact table or split it up into three fact tables.
I hope you can understand the problem.