0

Say I want to implement SCD type2 history dimension table (or should I say table with SCD type2 attributes) in the DWH system which for now I has been implementing as a "usual table" with a natural key + primary surrogate key + datefrom + dateto + iscurrent additional columns.

where

  • the primary surrogate key is needed in order to use it as a foreign key in all fact tables and
  • datefrom + dateto + iscurrent columns are needed in order to track a history.

Now I want to use a system-versioned temporal table in the fact-dimension DWH design, but MSDN is said that:

A temporal table must have a primary key defined in order to correlate records between the current table and the history table, and the history table cannot have a primary key defined.

So it looks like I should use a view with a primary surrogate key generating "on the fly" or another ETL process, but I do not like the both ideas...

Maybe there is another way?

Andrey Morozov
  • 7,839
  • 5
  • 53
  • 75
  • 1
    The [docs](https://docs.microsoft.com/sql/relational-databases/tables/temporal-table-usage-scenarios#slowly-changing-dimensions) work out an explicit scenario for a data warehouse using temporal tables. You'd be linking to the main table, not the history table, you'd drop the surrogate key entirely (or, if you still have one for storage reasons, it would be in a 1-1 correspondence with your natural key), and you need to change your queries to incorporate the time. For an existing warehouse, adopting system TTs may not be the most logical step -- it's a bunch of work for possibly negative gain. – Jeroen Mostert May 09 '18 at 12:19
  • You can't manually set (i.e. backdate) the dates in a temporal table so it's not very useful for SCD2. These dates are set to the insert time. This is not always what you want to do – Nick.Mc May 09 '18 at 12:46
  • 1
    @Nick.McDermaid: it is actually possible to work around this, by turning off system versioning and the period, inserting your data and then turning it on again, but of course that just adds more work to the data loading when it was supposed to make things simpler... (and presumably it's very inefficient if you keep consistency checks on with this). In an ETL warehouse system, system versioning isn't a good match. – Jeroen Mostert May 09 '18 at 12:54
  • Cool I learnt a few new things tonight. – Nick.Mc May 09 '18 at 13:20
  • Thank you @JeroenMostert for the link – Andrey Morozov May 10 '18 at 06:18

1 Answers1

1

You would use a Temporal Table in the persistent staging area of your data warehouse. Then you can simply apply changes from the source systems, and not loose any historical versions.

Then when you are querying, or when building a dimensional datamart, you can join facts to the current or to the historical version of a dimension. Note that you do not need surrogate keys to do this, but you can produce them to simplify and optimize querying the dimensional model. You can generate the surrogate key with an expression like

 ROW_NUMBER () OVER (ORDER BY EmployeeID, ValidTo) AS EmployeeKey

And then joining the dimension table when loading the fact table as usual.

But the interesting thing is that this can defer your dimensional modeling, and you choice of SCD types until you really need them. And reducing and deferring data mart design and implementation helps you deliver incremental progress faster. You can confidently deliver an initial set of reports using views over your persistent staging area (or 'data lake' if you prefer that term), while your design thinking for the datamarts evolves.

David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67