1

I am currently designing a data warehouse for a financial company. While a large amount of the process is quite standard I have been presented with an issue (That I believe only exists in the finance sector) of data events that effect a number of rows and their history that can happen at any time.

To explain the issue better. Assume we have an Account A and other 2 months 4 transactions have occurred that effect its balance, changing it from 10000 to 20000. When I run a report for that month its fine it will show the activity that derives that value. Now it gets difficult, a month after I backdate a transaction that effects that balance changing it from 20000 to 15000.

Running a report Before that back dating should tell me the original 20000, but after back dated transaction should tell me the 15,000.

To illustrate better refer to data below.


Transactions for September and October

with a back dated transaction on the 28th of October for the 13th of September of $500

and a back dated transaction on the 8th of November for the 17th of September to credit the $-50 

╔═════════════════╦═════════════════════════╦════════╦══════════════════╦═══════════════╦═════════════╦═════════╗
║ Key_Transaction ║ SK_TransactionEffective ║ Amount ║ PrincipleBalance ║ SK_ReportDate ║ SK_AsOfDate ║ Version ║
╠═════════════════╬═════════════════════════╬════════╬══════════════════╬═══════════════╬═════════════╬═════════╣
║               1 ║ 12/09/2018              ║  -1000 ║            20000 ║ 12/09/2018    ║ NULL        ║ 1       ║
║               6 ║ 13/09/2018              ║   -500 ║            19500 ║ 13/09/2018    ║ 28/10/2018  ║ 2       ║
║               2 ║ 16/09/2018              ║    -50 ║            19950 ║ 16/09/2018    ║ NULL        ║ 1       ║
║               7 ║ 16/09/2018              ║    -50 ║            19450 ║ 16/09/2018    ║ 28/10/2018  ║ 2       ║
║              12 ║ 16/09/2018              ║     50 ║            19950 ║ 16/09/2018    ║ 8/11/2018   ║ 3       ║
║               3 ║ 1/10/2018               ║    250 ║            20200 ║ 30/09/2018    ║ NULL        ║ 1       ║
║               8 ║ 1/10/2018               ║    250 ║            19700 ║ 30/09/2018    ║ 28/10/2018  ║ 2       ║
║              13 ║ 1/10/2018               ║    250 ║            20200 ║ 30/09/2018    ║ 8/11/2018   ║ 3       ║
║               4 ║ 6/10/2018               ║  -1200 ║            19000 ║ 6/10/2018     ║ NULL        ║ 1       ║
║               9 ║ 6/10/2018               ║  -1200 ║            17800 ║ 6/10/2018     ║ 28/10/2018  ║ 2       ║
║              14 ║ 6/10/2018               ║  -1200 ║            19000 ║ 6/10/2018     ║ 8/11/2018   ║ 3       ║
║               5 ║ 22/10/2018              ║    100 ║            19100 ║ 22/10/2018    ║ NULL        ║ 1       ║
║              10 ║ 22/10/2018              ║    100 ║            17900 ║ 22/10/2018    ║ 28/10/2018  ║ 2       ║
║              15 ║ 22/10/2018              ║    100 ║            19100 ║ 22/10/2018    ║ 8/11/2018   ║ 3       ║
║              11 ║ 29/10/2018              ║  -1000 ║            16900 ║ 29/10/2018    ║ NULL        ║ (New)1  ║
║              16 ║ 29/10/2018              ║  -1000 ║            18100 ║ 29/10/2018    ║ 8/11/2018   ║ (New)2  ║
╚═════════════════╩═════════════════════════╩════════╩══════════════════╩═══════════════╩═════════════╩═════════╝

Now when I run a report for September (2018-09-01 to 2018-09-30) I should be V1 or when SK_AsOfDate is NULL

If I run a report for October (2018-10-01 to 2018-10-31) my last record should be (11) with a principle balance of 16900

And my current Principle balance (As of 2018-11-09) should be calculated as of the balance from (16) with PB of (18100)

I have added the SK_AsOfDate to try and deal with versioning issue, but I am still struggling to see a simple and elegant way to achieve this "What was my balance as of 2018-09-30 that will ignore V2 and V3 alterations.

I want to get this right and luckily am not too far down any path, so suggestions are welcome! I am happy to add as many fields as makes this process simple for reporting out of the other end.

Mat
  • 202,337
  • 40
  • 393
  • 406
Caz1224
  • 1,539
  • 1
  • 14
  • 37
  • Just to boil down your explanation.. are you actually interested in preserving the two versions of the balance forever? Or you happy to apply the backdated transaction, fix up the balance, and just keep an audit record? Do you actually need two simultaneous versions? – Nick.Mc Oct 15 '18 at 06:00
  • Hey Nick, Yeah that is the idea. So reporting on the 2018-10-06 would give me version 1 with a PB of 19000. If I had the reported date 2018-10-29 the version 2 back dating would have been in effect so 2018-10-06 should give me the PB of 17800. Likewise if I have my report date as 2018-11-2018, version 3 is the data I should be getting. – Caz1224 Oct 15 '18 at 22:22
  • But you never want to see version 1 on 2018-10-06 right? So you’re not interested in preserving versions, you just want the right answer. So far it seems like you’re over complicating it. – Nick.Mc Oct 15 '18 at 22:39

3 Answers3

3

In the financial (and some other) transaction data you basically have two time dimensions.

Transaction date - representig the real time of the transaction happen, but due to technical reason you may recieve the transaction later.

Booking date - this is the timestamp the transaction entered your booking system. Sometimes called entry date.

Concerning the transaction date the transaction may occure as late arriving, contrary booking date is per definition always up to date.

The two time dimension allow two different kinds of reports.

The booking date report is typically used for bookkeping purposes (as it never changes in history). The transaction date report is more time realistic, but running it on two days for the last month can produce two different results (due to late transaction arrived on the second day).

Marmite Bomber
  • 19,886
  • 4
  • 26
  • 53
  • Hey Marmite. This is the methodology I was going for with the AsOfDate. Just wrapping my mind around keeping my reporting queries simplistic while also being able to know that transaction 1 and 6 are the same transaction record just under diffrent versioning and then based on reporting date >= AsOfDate which version to use. – Caz1224 Oct 15 '18 at 22:25
0

It looks that is some problem with late arriving facts. The question is what you would like to report later. Would you like to report the new values somehow or just to ignore the new arriving facts.

The first step is to determine businesskey that would enable to notify the difference. Is the Amount or PrincipleBalancefor Key_Transaction changing in time, or just new records arriving? Try to create snapshots of table to find the diffenrence of values in time in order to create good busineskey.

Some good ideas can be found here: http://www.disoln.org/2013/12/Design-Approach-to-Handle-Late-Arriving-Dimensions-and-Late-Arriving-Facts.html

What is the source database ? In Sql Server you can try to use Change Data Capture (it has to be enabled on server) or create mechanism mentioned above in your ETL.

I guess that the table you mentioned is not a low level grain but some kind of already aggregation based on other tables. Try to ask what stays technically behind it and dig deeper to find how it works.

Andrzej M.
  • 92
  • 3
0

I think your case can be solved with "snapshot" tables. In financial world like you elaborated as of 2018-10-31 or as of 2018-11-09 is important and you need to keep a copy of your data for each "as of", it might be different for each org looks like yours is weekly. It is upto you to decide the frequency. When you have this set of data regardless of the final state you can go back and get an accurate report.

The way to create these "snapshot" tables is basically create a copy of your fact table on each "as of" date with a "snapshot date", this snapshot date aka "as of" could be used in your reports to see the version of data you need to see.

Let me know if this solves your problem.

demircioglu
  • 3,069
  • 1
  • 15
  • 22