0

I have a fact table with each row representing these key pieces of information

  • Serial Number
  • Activity Type (Shipment, Return, Claim)
  • Activity Date
  • Account Name

A serial can have MANY events over its life. For instance

  1. Shipped to _
  2. Returned from _
  3. Shipped to _
  4. Claim from _
  5. Return from _
  6. Claim from _
  7. Shipped to _
  8. Claim from _

The challenge here is that I need to check if every instance of that was a valid transaction based on prior transactions. For instance:

  • A return is only valid if it matches the last shipped account
  • A claim is only valid if it hasn't been returned and the claim matches the last shipped account.

What I have been able to do so far is create reports as of the latest ship event for that serial. Essentially ignoring everything else.

What I can't figure out is how I would create a full year report that captures ALL possible violations though the course of the year. Keeping in mind that I have thousands of serials I need to run this report for.

In other words using my example above, I can easily identify #8 as valid/invalid, but how would I create a report that also says the status of #4 and #6? And then do this for every serial. I'm sure this type of problem has come up frequently, but I haven't been able to find resources to similar issues.

JohnB
  • 1,743
  • 5
  • 21
  • 40

2 Answers2

2

I love this fun, sound like requirements I had from my business users, here is how I would do it. To do similar rules I used LEAD and LAG analytic functions. Below is demonstration how to use them. Just figure out what rule is and just write it out using LAG or LEAD

LEAD - http://technet.microsoft.com/en-us/library/hh213125.aspx

LAG - http://technet.microsoft.com/en-us/library/hh231256.aspx

DECLARE @MyTable AS TABLE
    (
     SerialNumber BIGINT
    ,ActivityType VARCHAR(25)
    ,ActivityDate DATETIME
    ,AccountName VARCHAR(100)
    )

INSERT INTO @MyTable VALUES(1,'SHIP','01/01/2013','Bill');
INSERT INTO @MyTable VALUES(2,'RETURN','01/02/2013','Bill');
INSERT INTO @MyTable VALUES(3,'SHIP','01/03/2013','Bill');
INSERT INTO @MyTable VALUES(4,'RETURN','01/01/2013','Joe');
INSERT INTO @MyTable VALUES(5,'SHIP','01/02/2013','Joe');
INSERT INTO @MyTable VALUES(6,'SHIP','01/01/2013','James');
INSERT INTO @MyTable VALUES(7,'SHIP','01/02/2013','James');
INSERT INTO @MyTable VALUES(8,'SHIP','01/02/2013','Bill');

SELECT *
,CASE WHEN LAG(ActivityType) OVER (PARTITION BY AccountName ORDER BY ActivityDate ASC) IS NULL AND ActivityType IN ('Return','Claim') THEN 'FAIL' ELSE 'PASS' END AS Rule1
,CASE WHEN LAG(ActivityType) OVER (PARTITION BY AccountName ORDER BY ActivityDate ASC) IN ('Return','Claim') AND ActivityType IN ('Return','Claim') THEN 'FAIL' ELSE 'PASS' END AS Rule2
,CASE WHEN LAG(ActivityType) OVER (PARTITION BY AccountName ORDER BY ActivityDate ASC) = 'SHIP' AND ActivityType IN ('Return','Claim') THEN 'PASS' ELSE 'FAIL' END AS Rule3
,CASE WHEN LAG(ActivityType) OVER (PARTITION BY AccountName ORDER BY ActivityDate ASC) = 'SHIP' AND ActivityType = 'SHIP' THEN 'FAIL' ELSE 'PASS' END AS Rule4
FROM @MyTable
ORDER BY AccountName, ActivityDate

Rule1 check to seee that Return and Claim only happens after ship, if it is first item that happens that rule failed.

Rule2 checks to make sure that Return and Claim do not have happen after each other.

Rule3 is more of verifying good records where Return and Claim happened in right order.

Rule4 is to prevent shipping to same account twice.

  • Ah I remember reading about Lag in 2012 but haven't used it yet. This looks promising. I will give it a try and see how it works. – JohnB Jan 31 '14 at 21:45
  • LAG and LEAD allows you to access another row of the same table without reading that row again, this way you can compare values between rows. –  Jan 31 '14 at 21:47
  • Looks like this does the job, but the logic is going to be a bit more complicated due to data quality issues (ie back to back returns). Thanks for the help – JohnB Feb 01 '14 at 03:04
1

Here is the logic for solving the problem.

You are going to append to each record the last shipped account and the last returned account. These are the values that are needed for the logic.

Getting these is a bit complicated. First, the query assigns 1 to each shipped/returned row. Then it does a cumulative sum, to get rows that have the same "most recent shipped/returned". Then within the rows, it will assign the account.

After all this, the logic should be easy. Here is goes:

select f.*
from (select f.*, max(AccountName) over (partition by Serial, ShippedGroup) as LastShippedAccount,
             max(AccountName) over (partition by Serial, ReturnedGroup) as LastReturnedGroup
      from (select f.*,
                   sum(case when ActivityName = 'Shipped' then 1 else 0 end) over (partition by Serial
                                                                                   order by ActivityDate
                                                                                  ) as ShippedGroup,
                   sum(case when ActivityName = 'Returned' then 1 else 0 end) over (partition by Serial
                                                                                  order by ActivityDate
                                                                                 ) as ReturnedGroup,
            from fact f
           ) f
     ) f
where not (ActivityName = 'Return' and ActivityAccount = LastShippedAccount) or
      not (ActivityName = 'Claim' and ActivityAccount = LastShippedAccount and ActivityAccount <> LastReturnedAccount);
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786