0

Currently, I am participating in the design of a new data warehouse. I am quite new to this topic and have a general question regarding Star Schema and Many-to-Many relationships, in particular Many-to-1/2 relationships.

I would like to illustrate my problem with a short example, e.g. a sales data mart. I have a fact table which is on invoice level and contains measures like total sales in $, VAT... For each of these records I have at least one seller and at the maximum two sellers. Both sellers have the same attributes so that only one simple dimension for sellers is needed. How would you model this?

I can imagine the following three different approaches:

  1. Joining via a Bridge Table - I prefer that one, but I am not sure about this because of the additional effort the bridge table causes through the additional join, especially for large dimensions and especially in a case like this, where a record in the fact table is only associated with one or two records in a dimension table.
  2. Introducing two seller foreign keys into the fact table - I would use this approach only if there is in general a significant difference between both sellers. E.g. the first seller is always that one who is responsible for a product line (product line seller) and second one is always the key account manager of the customer. One the other side, this approach makes it more difficult to query the data warehouse, for example when I need the total sales of all sellers summed up for the current year regardless their role.
  3. One record in the fact table per seller - I create a duplicate for the records which have two sellers. Through that approach I can avoid an additional join compared to the bridge table, but my fact table would be larger. In addition, during the creation of a query or report I have to consider and eliminate the duplicates if necessary. So this approach also makes it more difficult to query the data.

Do you have any considerations regarding this? It would be great if you could share some knowledge. Thanks a lot.

moewe
  • 1
  • 1
  • There seems to be a fundamental problem here that if you have two sellers, A and B, who are associated with an invoice valued at GBP10, then when you run a report of seller and invoice value the invoice value is duplicated. This happens independently of whichever of the schemes you choose (but especially "when [you] need the total sales of all sellers summed up for the current year regardless their role") , and suggests to me that there is something wrong with the model in general. How many different roles can there be for sellers? – David Aldridge Aug 12 '15 at 14:23
  • Thanks for the comment. I think you are right about the underlying model, but unfortunately we can not change it. There are two roles for the sellers, but they are not reliable. Often the roles are mixed up and there is no way for us to identify problematic records and to correct mixed-up roles. Because of that we want to treat them as equal, which is luckily sufficient for our reporting. – moewe Aug 13 '15 at 06:16
  • I would go for option 1 then. – David Aldridge Aug 13 '15 at 10:32

1 Answers1

1

This would be my take on this based in my experience.

1) I would put the two sellers in the same row, this will make far easier to manage your aggregate metrics, for example.

2) Whenever there is only one seller, put a code representing "not applicable" to the field representing the other seller. Insert this "not applicable" row in your sellers dimension. This will ensure referential integrity and will give useful information about that row.

3) Make separate dimensions for seller1 and seller2. Just make one table of sellers and set up a VIEW to that table so you end up having 2 different dimensions but only one physical table.

4) Regarding "this approach makes it more difficult to query the data warehouse, for example when I need the total sales of all sellers summed up for the current year regardless their role." you can inform the same seller on both dimensions to give you the answer.

DW's evolve, your knowledge about your domain evolves too, keep in mind that. Maybe this design would need to be upgraded sometime, but you never know. The only thing you have to do is to accept change and be prepared for that.

Best regards and good luck.