Reaffecting dimension of fact table

Question

I am starting to build a star schema, and I like it ^^

I have a design problem with dimensional modeling.

I have a Fact table for each transaction in the star schema (highest grain) Something like that (simplified version)

transaction_facts
- id
- account_dim
- date_dim
- status_dim
- amount

status_dim
- id
- code
- description
- final

For a transaction, the status is not clearly defined at process time. Most all of the status fall into these cases:

the transaction is ok
the transaction is ko
the transaction is ok, but to be confirmed.

The last status is the problematic one, since I can receive the confirmation of the transaction few days (up to 10 and sometime, even more) after the original transaction.

How should I handle this kind of late change? Intuitively, I would be tempted to just reaffect the existing transactions to the new dimension, but it make me think of 2 things:

Is it a good practice? (do not rewrite history etc...)
How to handle this kind of change in BigQuery or Redshift or any append only system ? On a very high amount of rows, it will be a problem since these systems don't work well with updates

Jon Scott · Accepted Answer · 2018-11-12T09:38:28.020

1

If

this does not need to be a true "financial transaction" table AND
you don't need to keep a history of the values (e.g. what was the value as of some previous date)

Then you can/should update the value.

If using Redshift then you can do this efficiently by writing a batch of updates to a staging table (copy from s3) then applying these all in one go as an update.

edited Nov 12 '18 at 09:38

answered Nov 12 '18 at 09:17

Jon Scott

4,144
17
29

1

And make sure you run VACUUM on regular basis otherwise updates runs as a DELETE and INSERT in redshift, to free up the space you should run VACUUM. – hadooper Nov 13 '18 at 05:08

Reaffecting dimension of fact table

1 Answers1