0

I am starting to build a star schema, and I like it ^^

I have a design problem with dimensional modeling.

I have a Fact table for each transaction in the star schema (highest grain) Something like that (simplified version)

transaction_facts
- id
- account_dim
- date_dim
- status_dim
- amount

status_dim
- id
- code
- description
- final

For a transaction, the status is not clearly defined at process time. Most all of the status fall into these cases:

  • the transaction is ok
  • the transaction is ko
  • the transaction is ok, but to be confirmed.

The last status is the problematic one, since I can receive the confirmation of the transaction few days (up to 10 and sometime, even more) after the original transaction.

How should I handle this kind of late change? Intuitively, I would be tempted to just reaffect the existing transactions to the new dimension, but it make me think of 2 things:

  • Is it a good practice? (do not rewrite history etc...)
  • How to handle this kind of change in BigQuery or Redshift or any append only system ? On a very high amount of rows, it will be a problem since these systems don't work well with updates
nemenems
  • 1,064
  • 2
  • 9
  • 27

1 Answers1

1

If

  • this does not need to be a true "financial transaction" table AND
  • you don't need to keep a history of the values (e.g. what was the value as of some previous date)

Then you can/should update the value.

If using Redshift then you can do this efficiently by writing a batch of updates to a staging table (copy from s3) then applying these all in one go as an update.

Jon Scott
  • 4,144
  • 17
  • 29
  • 1
    And make sure you run VACUUM on regular basis otherwise updates runs as a DELETE and INSERT in redshift, to free up the space you should run VACUUM. – hadooper Nov 13 '18 at 05:08