I am building a data warehouse for my company. Recently, I just realized that there are some holes (potentially very dangerous) in my SCD type 2 dimension implementation, so that I have to review it over.
The current "fromdate" of a SCD type 2 dimension table is the date it came to the data warehouse, or the date that it replaced an older record, and the "todate" is usually null, or the date a new record with a same natural key came in replacing the old record.
Currently, when loading fact, I get the surrogate key for that fact by using the natural key and condition iscurrent = true or todate = null.
I just realize that this doesn't guarantee the correctness of the surrogate key for fact, for example:
What if the change happened at 11:00AM. This means: half of the transaction occured during that day will be related to the old dimension record, but half of them will related to the new dimension record. But when the data comes to the data warehouse, all the transactions of that day will be treated to be related to the new dimension, and this is not correct.
If we use datetime of the transactions to get the surrogate key more precisely, when loading fact records to the data warehouse, all the transaction that occured before the day the dimension comes to the Data Warehouse will not be able to find any dimension surrogate key related to it. For example: I made the dimension table yesterday, so all the start-date in that SCD 2 dimension table will have a min value of yesterday, while nearly all the old transactions (which haven't been loaded to the data warehouse) happened before that day. So they will have no surrogate key. Such paradox.
I even try to make it more precise by consolidate the start-date of a row, by trying to pass the create date of that dimension row in the OLTP system. But still I can not find the most correct way to do it. First the datetime in Data Warehouse and the OLTP system is different (because they might belong to different GMT+X)...
And many other problems .....
I understand that if we want to track the history in a perfectly precise accuracy, the only way is that we must implement it in the OLTP system by directly writing the related entity to the transaction records. Data warehouse can not do it. But I still feel that there are too many holes in the SCD 2 concept, or that I didn't implement the SCD Typ2 2 system correctly. So please teach me if the above problems is normal, or point the mistake in my understanding out for me.