0

I am using Kimball methodology for slowly changing dimensions in our datawarehouse. We have run into problems where a record is being updated multiple times in one day and the ETL runs every 15 minutes. So the Effective and Expiry Dates are getting messed up.

For example:

First      Last    Effective Date    Expiry Date   CurrRowInd
John       Smith   Jan 01, 1900      Aug 5, 2015   N
Jonathon   Smith   Aug 6, 2015       Aug 10, 2015  N
Jonathon   Smithe  Aug 11, 2015      Aug 10, 2015  Y

See how the last row has an EffectiveDate greater than the Expiry Date - this isn't good :(

Has anyone else encountered this type of issue - and how did you handle it?

I am using the Slowly-Changing Dimension Tool in SSIS

tember
  • 1,418
  • 13
  • 32

3 Answers3

1

Either you have to add Time to the validity dates in the source table (which isn't really that difficult to do), or adjust the code which sets the Effective Date/Expiry Date when someone makes a change (this may be a trigger, for example).

Your convention about validity dates seems to be that a row is valid from the Effective Date inclusive (in time terms, from Effective Date 00:00:00) to Expiry Date inclusive (in time terms, up to Expiry Date 23:59:59). So the code which updates these dates would have to set an earlier, now completely-invalidated update on the same day to:

EffectiveDate: today ExpiryDate: day before.

Obviously the ETL for the dimension would have to be consistent with this, and ignore rows where EffectiveDate>ExpiryDate.

SebTHU
  • 1,385
  • 2
  • 11
  • 22
0

Your issue looks like more of data mapping. Check your mapping connects in the ETL Package. Use a Flag Column. Look up on the records that exisit in your table and flag to 1. Select only the flagged records and lookup on all attributes. If changes update CurrRowInd as N and insert a new record

Unna
  • 42
  • 2
  • I don't really understand. The issue is that the job runs every 15 mins. So on Aug 15, there were two changes made - one to change it to Jonathon and one to change his last name to Smithe. But these changes occurred at two different times on one day. One solution would be to change it to a Type 1 dimension - but I have to discuss that with the business - at this point I am looking for a technical solution. Another would be to add time to the Effective Date and Expiry Date - but that would have a very large technical impact – tember May 02 '16 at 19:29
-1

I ran into this problem again and found a solution. After the SCD process runs I execute a delete statement to remove all but one of the entries made in a single day.

;with  cte as (
select e.EmployeeKey
        ,e.EmployeeID
        ,e.EffectiveDate
        ,e.ExpiryDate
        ,e.CurrentRowInd
        ,iif(e.EffectiveDate = LEAD(e.EffectiveDate) OVER (partition by EmployeeID order by EffectiveDate), 'DELETE','') AS Note
from DimEmployees e
where EmployeeID in (
            select EmployeeID
            from DimEmployees 
            group by EmployeeID, EffectiveDate
            having count(*) > 1
            )

)
DELETE
from cte
where Note = 'DELETE'
tember
  • 1,418
  • 13
  • 32