2

I am trying to understand pure type 6 SCD implementation from WIKI which says mainly three points

  1. Having a Type 2 surrogate key for each time slice can cause problems if the dimension is subject to change.
  2. A pure Type 6 implementation does not use this, but uses a Surrogate Key for each master data item (e.g. each unique supplier has a single surrogate key).
  3. This avoids any changes in the master data having an impact on the existing transaction data.

However I am unable to visualize these problems clearly.

Shantanu Gupta
  • 20,688
  • 54
  • 182
  • 286

1 Answers1

4

The point of "type 6" or "hybrid" processing is that you track changes by adding a new row for each new version but then you update some of the attributes on previous versions to reflect the current state of data. That way the data can be filtered and rolled-up across all versions for reporting purposes.

There's another explanation here: http://www.kimballgroup.com/wp-content/uploads/2012/05/DT15CombiningSCD.pdf

Example:

Type 6 Example

http://www.kimballgroup.com/2013/02/05/design-tip-152-slowly-changing-dimension-types-0-4-5-6-7/

nvogel
  • 24,981
  • 1
  • 44
  • 82
  • 1
    If i am updating some if the attributes on previous version, how would it reflect the current state of data? Moreover why surrogate key is choosen to be same unlike type 2 or type 6 hybrid modelling. – Shantanu Gupta Jun 28 '13 at 03:05
  • Added linked example (notice the "Current Department Name" column) – nvogel Jun 28 '13 at 16:06
  • example you quoted is type 6 hybrid i.e. type 1,2,3. Whereas i am looking into pure type 6 scenario where there is no current, historical columns. Instead they are using only single column. Moreover in this example, i am not able to understand why are we updating current column of expired recorda. – Shantanu Gupta Jun 29 '13 at 17:38
  • @ShantanuGupta, I see. I haven't seen anything called "Pure Type 6" anywhere except on Wikipedia. Beware of taking at face value anything you read on Wikipedia. The point of it appears to be to have a surrogate key that doesn't change over time. – nvogel Jun 29 '13 at 18:29
  • By pure type 6, I mean not to have historical column. Why do we need to have historical column. – Shantanu Gupta Jun 30 '13 at 14:42
  • The historical Department Name column preserves the state of data as it was. The Current Department Name column gives the current value. So reports can be produced either based on the "as is" value or the "as was" value. For financial and regulatory reporting for example the "as was" version of data can be just as important as "as is". – nvogel Jun 30 '13 at 15:00
  • Shouldn't we have a single column i.e. Current. And whenever there is an update, we simply retire that record and make a new entry with updated values. This way we have previous key as history and latest record as current? – Shantanu Gupta Jun 30 '13 at 16:12
  • 1
    @ShantanuGupta but if you need to rollup or filter across all versions of the row then you'll need a value that's consistent across all versions. That's why you might want to retain both current and history attributes. – nvogel Jul 04 '13 at 14:41
  • But we have one attribute as a natural key. Using natural key can't we always roll up? – Shantanu Gupta Jul 05 '13 at 01:11
  • For the key attributes, yes. But in the example given the Current Department Name isn't a key attribute. – nvogel Jul 05 '13 at 05:31