0

I am using Data vault 2.0 model. Here we are not supposed to use UPDATE in a table. Normally in RDBMS, we implement SCD-2 using UPDATE & INSERT strategy. But in my case, I can use only INSER strategy. My Source is Kafka input and its loaded into Snowflake (as avro format) and I'm flattening and loading the same into another table in RDBMS format.

My question is: Can anyone please help me to implement SCD type-2 logic without using UPDATE. Only with INSERT strategy I have to do this.

I should keep the history also. For example, if I get rec-1 in interval-a and in interval-b, I got 2 updates for rec-1, then I need to load all the three records in my target pointing the last record came in will be the latest one.

Jomy
  • 87
  • 1
  • 13
  • Here is a reference: [How to implement insert only in Data Vault 2.0?](https://blog.scalefree.com/2018/11/12/insert-only-in-data-vault/) Basically, with INSERT strategy, you only insert the start date but not the end date. You then calculate the end date using a SQL statement and a window function. – Maja F. Jan 18 '22 at 17:41
  • I come from the Snowflake side, not the data-vault, is the INSERT only to prevent "data tampering/alterations" or because you cannot use other commands like `MERGE`? – Simeon Pilgrim Jan 18 '22 at 20:41
  • And can you run a secondary process to "build a correctly ranged (start/end)" table, so you don't have to use a window function on every read? I was thinking a materialized view, but those cannot use LAG/LEAD. – Simeon Pilgrim Jan 18 '22 at 20:44
  • You seem to be mixing 2 different modelling approaches. SCDs are a concept in dimensional modelling whereas Data Vault doesn’t have the concept of dimensions and therefore doesn’t have SCDs – NickW Jan 18 '22 at 23:44
  • Thanks for you inputs and the link @Marcel. But yes, as Maja has told, there is an UPDATE in the link that you have shared. – Jomy Jan 19 '22 at 01:38
  • @SimeonPilgrim As per Data vault "rule", we should not be using UPDATE (even if we run an UPDATE it will work still). Only INSERT are recommended inside it. Yes, we can use other functions also, MERGE is allowed. – Jomy Jan 19 '22 at 01:39
  • @NickW We can implement SCD Type-2 in Data vault also. But its a tricky process. it cannot be achieved directly/easily as we do in normal dimensional modelling. – Jomy Jan 19 '22 at 01:41
  • You have to omit the update. The update is there because the author implemented an End date in his satellite, which is not DV standard. Instead you just use the start date and then INSERT is enough. End date can be calculated with a view on top. – Marcel Jan 19 '22 at 07:58

1 Answers1

-1

Your satellite key needs to consist out of the Hash Key + the Load Date Timestamp. Each individual version will then have a new primary key and thus can be inserted into the satellite. By querying the latest timestamp, you will get the latest version.

You can find a sample script here: https://simplesqlbi.home.blog/2019/06/29/part-3-data-vault-for-beginners/

Marcel
  • 2,454
  • 1
  • 5
  • 13
  • This is not very useful. The sample script that is referenced uses an UPDATE and an INSERT statement to load the satellite. The question was whether the satellite can be loaded without using UPDATE. – Maja F. Jan 18 '22 at 17:29
  • DV 2.0 standard says you have to use start date, but NO end date. And this means that INSERT is working and UPDATE is not needed. So the link is correct. :) – Marcel Jan 19 '22 at 07:56