0

I have a case where I'm building factless fact table for my DWH. There are 2 dimension that I want to ask for this case: location and store. I have 2 approach.

  1. Building dim_store and fact_account. Then put all the location data into the fact_account table

  2. Building dim_store, dim_location, and fact_account. Then put the store_id and location_id on the fact_account

Here is the visualization for these 2 approaches:

1. FIRST APPROACH

2. SECOND APPROACH

Which is the best approach and why?

Thank you in advance.

Mohammad Iqbal
  • 317
  • 4
  • 13
  • Hi - a dimensional model should be designed round your reporting requirements, so without knowing what those are it is difficult to comment in a design. Can a location only be associated to a store (and not a residence/office/factory/etc)? If so then why not denormalise the location data into the store dimension? BTW option 1 is definitely wrong as a fact table should never contain attributes unless they are degenerate dimensions? – NickW Oct 18 '20 at 17:06
  • The first approach is called **denormalization**. And BTW you have a third option where you *denormalize* both dimensions and ends only with the fact table. A *denormalization* is neither good nor bad. It depends on you query and processing pattern. You need more storage but you can speed up some queries. And - do not forget - *you* are responsible for keeping consistency in the denormalized schema. – Marmite Bomber Oct 18 '20 at 17:06
  • @NickW I'm going to have other dimensions such as suppliers, warehouse, etc etc and consolidate all on fact_account. – Mohammad Iqbal Oct 19 '20 at 01:49
  • @MarmiteBomber Yup I know the denormalization process and thank you for pointing that out. So if I have other fact tables - for example fact_sales and fact_order that contains store and their order, I'm not maintaining the store on the dimension, but instead on the fact tables? – Mohammad Iqbal Oct 19 '20 at 01:53

1 Answers1

2

Option 1 is definitely wrong, what is described there is not a dimensional model.

Option 2 is a correctly designed dimensional model. Whether it is the best way to dimensionally model your data depends on your reporting requirements.

NickW
  • 8,430
  • 2
  • 6
  • 19