1

In a datamart where 2 or 3 or more dimensions merge to form a new dimension. How can the scd be managed to keep track of all mergers historically and present trending facts as related to those dimensions?

A specific example would be three stores (Business ID 8897, 8965, 9135) merge to create a new store Business ID 9700. How can historical sales data be fetched from the fact table to show that up until a given date 8897, 8965 and 9135 were separate stores which are all now a new store 9700.

Additionally what if the business number of the new store is not 9700 but the new store takes on one of the previous stores business ID. So instead of 9700 the new merged store business ID is 8897.

SurrogateKey -------- StoreBusinessID---------- StoreName
=============== =================== ==================

          1                8897   Alpha Electronic  
          2                8965   Beta Electronics  
          3                9135   Gamma Electronic  
          4                9700   Mega Electornics  

=============== =================== ==================

2 Answers2

1

You need to use "Type 6" slowly changing dimension. Essentially, it's a combination of Type 2 and Type 1 change.

The way it works: To capture "Type 2" change, you will need to have "Start Date" and "End Date" for each record in your "Store" table. For current records, end date is usually some distant future date like 2999-12-31. When you connect such table to fact tables, you will need to join dimension and fact on natural key (StoreBusinessID) and on fact date between start date and end date.

To capture "Type 1" change, you will have to add a field to your "Store" table to track "Latest Business ID". This field will contain the latest version of the business key. If there is no change, Store Business ID and Latest Business ID will contain the same key. If the stores merge, records 8897, 8965, 9135 and 9700 will all contain "Latest Business ID" 9700.

As a result, you can both "travel in time" (reproduce exact history in any period), and group by the latest version of the store.

RADO
  • 7,733
  • 3
  • 19
  • 33
  • 1
    A Type 3 is when you add a new column to track multiple alternative realities at once- they're not the same as a combination of Type 2 and Type 1- as per https://www.kimballgroup.com/2008/09/slowly-changing-dimensions-part-2/. This is more akin to a Type 6. – Rich Mar 17 '18 at 17:18
  • I see the value of Type 6. The scenario with one iteration of store joins is working, I am not sure what happens in the second iteration and how to keep track of it. So taking the example the "Latest BUsiness ID" 9700 is stamped for 8897,8965,9135 and 9700. What happens when 9700 merges with say 9900 after 6 months. In this case the stores 8897,8965,9135 and 9700 will have Latest Business ID as 9900. How will one know that for 6 months 8897, 8695, 9135 and 9700 were 9700? and not 9900? – Faisal Majeed Mar 19 '18 at 19:41
  • If you need to navigate a full history of store mergers, you will have to use "unbalanced hierarchy". This is a very advanced and complex solution involving recursion; before you decide to do it, make sure it's really worth it. If it is, these articles can point you to the right direction: http://blog.chrisadamson.com/2012/05/recursive-hierarchies-and-bridge-tables.html, or http://brazenly.blogspot.com/2015/02/datawarehouse-dimensional-modeling.html – RADO Mar 19 '18 at 21:50
  • @RADO Thank you, I believe your latest suggestion of using unbalanced hierarchy fits the bill, but definitely at a price of added complexity. I believe with this knowledge now I can revisit the requirements set the business and articulate what complexity they will be getting for the features they are requesting and logically show why it has to be that complex. I think the Type 6 would be adequate if we compromise on the point-in-time requirement and stick with AS-Was and AS-Is reporting based on effective dates. though I might change from using Latest Business ID to a durable key reference – Faisal Majeed Mar 20 '18 at 14:37
0

The fact tables already hold data related to Stores with SK 1, 2 and 3 (ID 8897, 8965 and 9135), so you can leave the fact tables as they are and you'll always have data on the past performance of those stores, up to the date when they were discontinued.

When that new store (ID 9700) appears you can add it as a new row (SK 4) in the Store dimension, set it as the active row and set the "discontinued" store rows as inactive (you can use a boolean column for this, e.g. "isActive" or "version"). From this point in time, all data loaded into the fact tables will point to the store with SK 4 (ID 9700).

Before that point in time, it only had data for the SK 1, 2 and 3 (ID 8897, 8965 and 9135). For this reason, you'll always be able to look at data from past stores and/or present stores, by using the isActive field in the Store dimension when you join it with the fact tables (using the SK).

This is known as a Slowly Changing Dimension Type 2:

This method tracks historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers. Unlimited history is preserved for each insert.

Edit: regarding the second part of your question, "the flexibility required is that I could pull it for 9700 before 9700 existed in history by merging SK 1, 2 and 3"

One simple option would be to introduce a hierarchy (parent store -> store) into the Store dimension and, for each of the other stores, set the "parent store ID" field to 9700. This way you can explore sales data specifically for each of the past stores, by using their IDs, or just explore data for the parent store (parent ID = 9700) which will provide an aggregate view of all the data for all stores under parent 9700.

This has the added benefit of making the data in the Store dimension accurately depict what happened: stores were merged into a single entity, and no historic data was lost for any of them (which could happen if we overwrite their IDs).

jmng
  • 2,479
  • 1
  • 25
  • 38
  • 1
    The issue with treating this as simple SCD 2 is that it is not showing anywhere that the three stores merged to form the one store 9700. When I pull sales report, the flexibility required is that I could pull it for 9700 before 9700 existed in history by merging SK 1, 2 and 3 (ID 8897, 8965 and 9135). Since the merger information is not getting captured by SCD Type 2. I will not be able to show a stitched history of data. – Faisal Majeed Mar 14 '18 at 15:31
  • Then why not 1) update the dimension as an SCD and add a row to represent the store "9700+8897+8965+9135" ; 2) create a new fact table, e.g. FactSalesMerged, that contains a replica of the sales metrics but points to the new member of the dimension. This way you an refer to this specific fact table when you need an historic and merged view of the sales data. – jmng Mar 14 '18 at 18:26
  • Another option would be to introduce an hierarchy (parent store -> store) into the Store dimension and, for each of the other stores, set the "parent store ID" field to 9700. This way you can explore sales data specifically for each of the past stores, or just explore for the parent store (which will provide an aggregate view of the data). Added as an edit to the answer. – jmng Mar 14 '18 at 18:59
  • 1
    I am not sure about the FactSalesMerged table you have suggested. I am leaning more towards your second option of "Parent Store Key". Which is like introducing a durable key to the dimension table, and adding another column in the fact table for Parent Store Key. This way I think data could be retrieved for either the current store or the parent store. In essence, trickle the SK4 key and make it parent store Key for the merged stores. But what happens when later on Sk4 splits to two stores, Sk5 and Sk6 what is the parent store key for sk5 and sk6 and also sk4 do we update all? and to what? – Faisal Majeed Mar 15 '18 at 18:18
  • In reality, the proposed business/domain concept seems to exist in your case: some stores will be merged into a "parent" store, some won't. You can have the concept of stores without parent stores for SK5 and SK6, and these could have the parent store ID equal to the child store ID. – jmng Mar 15 '18 at 18:45
  • Ok so SK5 and Sk6 would have the parent store id as their own IDs, but in this case how will we know that SK5 and SK6 came from SK4? Maybe I am missing something – Faisal Majeed Mar 15 '18 at 19:15
  • I misunderstood your previous comment. In the case you mentioned, if store 4 is splitting into store 5 and 6, you could set 4 as the parent for 5 and 6 to keep track of the split. Another option would be to model this as a many-to-many relationship: use a bridge/fact table to join the dimension with itself and use the bridge/fact tables columns/keys to capture the parent of each store and the date of change. – jmng Mar 15 '18 at 19:37