2

I have a Customer table with the following structure.

CustomerId Name   Address    Phone
1          Joe    123 Main   NULL

I also have an Audit table that tracks changes to the Customer table.

Id  Entity   EntityId  Field    OldValue      NewValue     Type   AuditDate  
1   Customer 1         Name     NULL          Joe          Add    2016-01-01
2   Customer 1         Phone    NULL          567-54-3332  Add    2016-01-01
3   Customer 1         Address  NULL          456 Centre   Add    2016-01-01
4   Customer 1         Address  456 Centre    123 Main     Edit   2016-01-02
5   Customer 1         Phone    567-54-3332   843-43-1230  Edit   2016-01-03
6   Customer 1         Phone    843-43-1230   NULL         Delete 2016-01-04

I have a CustomerHistory reporting table that will be populated with a daily ETL job. It has the same fields as Customer Table with additional field SnapShotDate.

I need to write a query that takes the records in Audit table, transforms and inserts into CustomerHistory as seen below.

CustomerId Name   Address      Phone         SnapShotDate
1          Joe    456 Centre   567-54-3332   2016-01-01
1          Joe    123 Main     567-54-3332   2016-01-02
1          Joe    123 Main     843-43-1230   2016-01-03
1          Joe    123 Main     NULL          2016-01-04

I guess the solution would involve a self-join on Audit table or a recursive CTE. I would appreciate any help with developing this solution.

Note: Unfortunately, I do not have the option to use triggers or change the Audit table schema. Query performance is not a concern since this will be a nightly ETL process.

Thracian
  • 651
  • 4
  • 8
  • 24
  • Just to be clear, you already have an ETL to populate the CustomerHistory, why do you want to populate it from Audit table? BTW, this is an interesting puzzle :) – FLICKER Jun 10 '16 at 00:02
  • @FLICKER - Just to clarify, I need to build a query to populate CustomerHistory table via an ETL process. It is not being populated with data at this point. – Thracian Jun 10 '16 at 00:58
  • Got it. I'm close to find a solution – FLICKER Jun 10 '16 at 17:11
  • This has solved my problem. http://sqlmag.com/t-sql/last-non-null-puzzle – Thracian Jun 12 '16 at 06:09

1 Answers1

0

You can use below script.

DROP TABLE #tmp

CREATE TABLE #tmp (
    id INT Identity
    , EntityId INT
    , NAME VARCHAR(10)
    , Address VARCHAR(100)
    , Phone VARCHAR(20)
    , Type VARCHAR(10)
    , SnapShotDate DATETIME
    )

;with cte1 as (
select AuditDate, EntityId, Type, [Name], [Address], [Phone]
from 
    (select AuditDate, EntityId, Type, Field, NewValue from #Audit) p
pivot
    (
    max(NewValue)
    for Field in ([Name], [Address], [Phone])
    ) as xx
)
insert into #tmp (EntityId, Name, Address, Phone, Type, SnapShotDate)
select EntityId, Name, Address, Phone, Type, AuditDate
from cte1


-- update NULLs columns with the most recent value
update #tmp
set Name = (select top 1 Name from #tmp tp2 
            where EntityId = tp2.EntityId and Name  is not null 
            order by id desc)
where Name is null

update #tmp
set Address = (select top 1 Address from #tmp tp2 
               where EntityId = tp2.EntityId and Address is not null 
               order by id desc)
where Address is null

update #tmp
set Phone = (select top 1 Phone from #tmp tp2 
             where EntityId = tp2.EntityId and Phone is not null 
             order by id desc)
where Phone is null

To Create Test Data, use below script

CREATE TABLE #Customer (
    CustomerId INT
    , NAME VARCHAR(10)
    , Address VARCHAR(100)
    , Phone VARCHAR(20)
    )

INSERT INTO #Customer
VALUES (1, 'Joe', '123 Main', NULL)

CREATE TABLE #Audit (
    Id INT
    , Entity VARCHAR(50)
    , EntityId INT
    , Field VARCHAR(20)
    , OldValue VARCHAR(100)
    , NewValue VARCHAR(100)
    , Type VARCHAR(10)
    , AuditDate DATETIME
    )

insert into #Audit values
(1,   'Customer', 1,         'Name'     ,NULL            ,'Joe'          ,'Add'    ,'2016-01-01'),
(2,   'Customer', 1,         'Phone'    ,NULL            ,'567-54-3332'  ,'Add'    ,'2016-01-01'),
(3,   'Customer', 1,         'Address'  ,NULL            ,'456 Centre'   ,'Add'    ,'2016-01-01'),
(4,   'Customer', 1,         'Address'  ,'456 Centre'    ,'123 Main'     ,'Edit'   ,'2016-01-02'),
(5,   'Customer', 1,         'Phone'    ,'567-54-3332'   ,'843-43-1230'  ,'Edit'   ,'2016-01-03'),
(6,   'Customer', 1,         'Phone'    ,'843-43-1230'   ,NULL           ,'Delete' ,'2016-01-04'),
(7,   'Customer', 2,         'Name'     ,NULL            ,'Peter'        ,'Add'    ,'2016-01-01'),
(8,   'Customer', 2,         'Phone'    ,NULL            ,'111-222-3333'  ,'Add'    ,'2016-01-01'),
(8,   'Customer', 2,         'Address'  ,NULL            ,'Parthenia'   ,'Add'    ,'2016-01-01')

Result

EntityId    Name    Address     Phone           Type    SnapShotDate
1           Joe     456 Centre  567-54-3332     Add     2016-01-01
1           Joe     123 Main    843-43-1230     Edit    2016-01-02
1           Joe     123 Main    843-43-1230     Edit    2016-01-03
1           Joe     123 Main    843-43-1230     Delete  2016-01-04
FLICKER
  • 6,439
  • 4
  • 45
  • 75
  • Thank you for the response. When I run the query, I get a different result-set (see below). It has one row for Joe, the rest is for Peter and the snapshotdate is not accurate. In addition, the solution requires an update statement per field. Not a big deal for 3 fields but when there are more fields, there will be a lot of repeated code. I am thinking we could come up with be a leaner solution using a self-join (or recursive CTE). Again, thanks for this initial effort. – Thracian Jun 11 '16 at 14:01
  • id EntityId NAME Address Phone Type SnapShotDate 1 1 Joe 456 Centre 567-54-3332 Add 2016-01-01 00:00:00.000 2 2 Peter Parthenia 111-222-3333 Add 2016-01-01 00:00:00.000 3 1 Peter 123 Main 843-43-1230 Edit 2016-01-02 00:00:00.000 4 1 Peter 123 Main 843-43-1230 Edit 2016-01-03 00:00:00.000 5 1 Peter 123 Main 843-43-1230 Delete 2016-01-04 00:00:00.000 – Thracian Jun 11 '16 at 14:01
  • In the first query, when populating the temp table, use ORDER BY EntityId, AuditDate. Let me know how it worked – FLICKER Jun 11 '16 at 16:51
  • or use ORDER BY EntityId, AuditDate. – FLICKER Jun 11 '16 at 17:11