0

In https://www.linkedin.com/pulse/data-lt-mysteries-effectivity-satellite-driving-key-patrick-cuba/?trk=read_related_article-card_title Patrick Cuba showed how the insert-only logic work when populating an effectivity satellite.

However when applying this to an event stream where we have out of order events, the ingestion and querying logic seems to explode in complexity.

Example

Consider this series of change events depicting change in a person's employment relationship.

person_id employee_id src_event_time change_type
A X t1 INSERT
A X t2 DELETE
A X t3 INSERT

Consider the following scenario where the event come in the order of t3, t1, t2 and we're ingesting it into the effectivity link satellite table:

1. Processing event t3

person_id employee_id effectivity_start_date effectivity_end_date
A X t3 t9999

2. Processing event t1

Here in our ingestion logic we can identify that the new event time t1 is before the existing event t3. So we set the effectivity to between t1 and t3

person_id employee_id effectivity_start_date effectivity_end_date
A X t3 t9999
A X t1 t3

3. Processing event t2

This is where things get a bit weird. Our logic will have to detect that there are existing effectivity period (from t1 to t3) and then "correct" that by adding a (t1 - t2) effectivity period. Since our constraint is that we only ever INSERT and not update existing rows, we now have two effectivity rows with the same start time t1.

person_id employee_id effectivity_start_date effectivity_end_date
A X t3 t9999
A X t1 t3
A X t1 t2

Question

Is this the correct approach? How would I go about querying the state of a relationship at a given point in time? (For example, how would I query the relationship at t1.5, or t2.5?)

Jacob Wang
  • 4,411
  • 5
  • 29
  • 43

1 Answers1

0

It’s not quite correct Data Vault Guru contains better examples, load code and query code

  • 1
    I do not see how this answers the question at the top of this page, but it should. Please [edit] according to [answer] or delete the answer. Note especially that only mentionging a tool/lib, without showing how it can be used to solve the problem, is not considered an answer. Otherwise it risks being flagged as "not an answer" and being deleted. – Yunnosch Jul 04 '21 at 06:23
  • Hi Patrick Cuba. I see that you have been online since I commented first time on your answer, but did not react or ask for clarification. Please [edit] your post to make more obvious how this is answering the question at the top of this page. You might want to improve your contribution by splitting it into more and smaller complete sentences. – Yunnosch Jul 08 '21 at 06:20