0

Reading upon many Kimball design tips regarding fact tables (transaction, accumulating, periodic) etc. I'm still vague what should I do with my case of updating a fact table which I believe is not that uncommon. To the case.

We're processing complaints from clients, and we want to be able to reflect current status of complaint in the Data Warehouse. Our complaints have a workflow of statuses they go through, different assignees that deal with them on time, but for our analysis this is irrelevant as of now. We would like to review what the current situation on complaint is.

To my understanding the grain of the fact table would be single complaint, with columns (irrelevant for this question whether it should be junk dimension, degenerate etc) such as:

  • Complaint Number
  • Current Status
  • Current Status Date
  • Current Assignee
  • Type of complaint

As far as I understand, since we don't want to view the process history, but instead see what the current status of the process is, storing multiple rows for each complaint representing it's state is an overkill, so instead we store only one row per complaint and update it.

Now, is my reasoning correct to do that? In above case, complaint number and type of complaint store values that don't change, while "Current" columns do and we need to update the row, so we could implement Change Data Capture mechanism (just like we do for dimensions right now) to compare incoming rows from source system for this fact with currently stored fact rows to improve time cost of such operation.

It honestly looks like a Dimension table with mixed SCD Type 0&1 for me, but it stores facts of receiving complaints.

SO Post for reference: Fact table with information that is regularly updatable in source system

Edit

I'm aware that I could use accumulating fact table with time stamps which is somewhat SCD Type 2 alike but the end user doesn't really care about the history of the process. There are more facts involved in the analysis later on, so separating this need from data warehouse doesn't really work in this case.

Kamil Gosciminski
  • 16,547
  • 8
  • 49
  • 72

1 Answers1

1

I’ve encountered similar use cases in the past, where an accumulating snapshot would be the default solution.

However, the accumulating snapshot doesn’t allow processes with varying length. I’ve designed a different pattern, when 2 rows are added for each event: if an object goes from state A to state B you first insert a row with state A and quantity -1, then a new one with state B and quantity +1.

The end result allows: - no updates necessary, only inserts; - map-reduce friendly; - arbitrary length processes; - counting how many of each in each state at any point in time (with the help of a periodic snapshot for performance reasons); - how many entered or left any state at any point in time.; - calculate time in each state and age overall.

Details in 5 blog posts here (with implementation in Pentaho Data Integration):

http://ubiquis.co.uk/dwh/status-change-fact-table-part-1-the-problem/

nsousa
  • 4,448
  • 1
  • 10
  • 15
  • Thanks for your post, I believe your take at it is a modification of proposed Kimball's approach to make the transaction fact table SCD2 alike: https://www.kimballgroup.com/2012/05/design-tip-145-time-stamping-accumulating-snapshot-fact-tables/ I also considered this, but didn't want the overkill of gathering history since the end user seems to really not care about the process, which I find unfortunate, though. – Kamil Gosciminski Jun 28 '19 at 13:22
  • Yes, but not quite. Doesn’t have the need to account for updates, and provides net counts more easily. – nsousa Jun 28 '19 at 18:28