6

Here is a table with sample data:

DECLARE @TestTable TABLE (
    ItemID INT,
    A INT,
    B INT,
    Month INT)

INSERT INTO @TestTable VALUES (1234, 5, 9, 1)
INSERT INTO @TestTable VALUES (1234, 6, 9, 2)
INSERT INTO @TestTable VALUES (4321, 5, 11, 1)
INSERT INTO @TestTable VALUES (4321, 12, 11, 2)
INSERT INTO @TestTable VALUES (1324, 14, 6, 1)
INSERT INTO @TestTable VALUES (1324, 5, 6, 2)
INSERT INTO @TestTable VALUES (1234, 1, 9, 3)
INSERT INTO @TestTable VALUES (1324, 9, 6, 3)

Something to note is that the B column is always the same as it's only used once in this calculation, but is needed for the initial calculation.

I am attempting to subtract B from A on the first row, then on subsequent rows subtract the previous rows difference from A. Effectively, B - A = C on the first then C - A on all subsequent rows FOR THE RELATED ItemID.

Here are the results I'm expecting:

ItemID  A   B   C   Month   RowNumber
1234    5   9   4   1       1
1234    6   9   -2  2       2
1234    1   9   -3  3       3
1324    14  6   -8  1       1
1324    5   6   -13 2       2
1324    9   6   -22 3       3
4321    5   11  6   1       1
4321    12  11  -6  2       2

Here is how I am accomplishing this.

;WITH CTE_TestValue AS (
    SELECT 
        Main.ItemID,
        Main.A,
        Main.B,
        Main.Month,
        ROW_NUMBER() OVER (Partition BY Main.ItemID ORDER BY Main.Month) AS RowNumber
    FROM @TestTable AS Main
),
CTE_TestColumnC AS (
    SELECT 
        MainA.ItemID,
        MainA.A,
        MainA.B,
        (MainA.B - MainA.A) AS C,
        MainA.Month,
        MainA.RowNumber
    FROM CTE_TestValue AS MainA
        WHERE MainA.Rownumber = 1

    UNION ALL

    SELECT 
        MainB.ItemID,
        MainB.A,
        MainB.B,
        (Sub.C - MainB.A) AS C,
        MainB.Month,
        MainB.RowNumber
    FROM CTE_TestValue AS MainB
        INNER JOIN CTE_TestColumnC AS Sub
            ON MainB.RowNumber - 1 = Sub.RowNumber
            AND MainB.ItemID = Sub.ItemID
--      CROSS JOIN CTE_TestColumnC AS Sub
--          WHERE Sub.RowNumber + 1 = MainB.RowNumber
--          AND MainB.ItemID = Sub.ItemID 
)
SELECT 
    Main.ItemID,
    Main.A,
    Main.B,
    Main.C,
    Main.Month,
    Main.RowNumber
FROM CTE_TestColumnC AS Main
ORDER BY ItemID, Month, RowNumber

This works fine on a small data-sample, but I'm dealing with about 20,000 ItemId's each repeating 10 times. It finishes all the first row calculations instantly, as expected, and then the calculation times go up DRASTICALLY.

As you can see I've tried both an INNER JOIN and a CROSS JOIN. I believe they have the same execution plan with the parameters that I've given the CROSS JOIN.

Is there a more effective/efficient way to accomplish this?

I allowed this to run for 5 hours yesterday to see if it ever ended.. it did not.

Another note: When I'm using this on the test data I SELECT WITHOUT using ORDER to hopefully help speed things along. The ORDER is just for my convenience when I'm fact checking.

jayEss
  • 129
  • 1
  • 1
  • 8
  • Pretty sure this is nondeterministic BY Main.ItemID ORDER BY Main.ItemID as ItemID repeats. – paparazzo Oct 10 '12 at 20:05
  • This is really a dulled down example of a much larger problem. There is actually another column in the data I'm using that will properly order it as I need it. I just didn't want to muddy up the example as that has left me with no answers in the past – jayEss Oct 10 '12 at 20:45
  • Based on the up-votes of my example query I am assuming that this is the best way. I guess I will need to find a way to index the data-sample to hopefully speed things along. Can anyone explain why this calculation is taking so long? I assume it is related to the recursive "loop" that has to re-SELECT the previous data-set 10 times per ItemID. – jayEss Oct 10 '12 at 22:08
  • No, I don't think that is is best way to do it. CTE is just syntax and is reevaluated. I found the problem statement flawed and u dismissed my comment. Subtract the previous row is not C - A it is A - Cprior. – paparazzo Oct 10 '12 at 22:54
  • @Blam I did not intend to dismiss your comment. I just felt it was irrelevant to the issue at hand. However, I now see that it could cause problems in any answer I may receive. I have changed the original post to more closely reflect what I am dealing with. – jayEss Oct 10 '12 at 23:14
  • "*I tried an inner join and a cross join*" but those are two completely different concepts and should produce different results (unless one of the tables has only one row). –  Oct 11 '12 at 06:30

2 Answers2

7

Your problem is that you are using a CTE as the source of a recursive CTE. Your first CTE will be executed once for each iteration of your recursive CTE. With your test data that means that CTE_TestValue is created 8 times.

Put the result of CTE_TestValue in a temp table that has a clustered primary key on (RowNumber, ItemID) and use that temporary table as the source of data for the recursive CTE CTE_TestColumnC.

Also change the join condition in the recursive part to ON MainB.RowNumber = Sub.RowNumber + 1. That will make the query able to use the index on the temporary table.

DECLARE @TestTable TABLE (
    ItemID INT,
    A INT,
    B INT,
    Month INT)

INSERT INTO @TestTable VALUES (1234, 5, 9, 1)
INSERT INTO @TestTable VALUES (1234, 6, 9, 2)
INSERT INTO @TestTable VALUES (4321, 5, 11, 1)
INSERT INTO @TestTable VALUES (4321, 12, 11, 2)
INSERT INTO @TestTable VALUES (1324, 14, 6, 1)
INSERT INTO @TestTable VALUES (1324, 5, 6, 2)
INSERT INTO @TestTable VALUES (1234, 1, 9, 3)
INSERT INTO @TestTable VALUES (1324, 9, 6, 3)

CREATE TABLE #TestValue
(
  ItemID INT,
  A INT,
  B INT,
  Month INT,
  RowNumber INT,
  primary key(RowNumber, ItemID)
)

INSERT INTO #TestValue
SELECT 
    Main.ItemID,
    Main.A,
    Main.B,
    Main.Month,
    ROW_NUMBER() OVER (Partition BY Main.ItemID ORDER BY Main.Month) AS RowNumber
FROM @TestTable AS Main


;WITH CTE_TestColumnC AS (
    SELECT 
        MainA.ItemID,
        MainA.A,
        MainA.B,
        (MainA.B - MainA.A) AS C,
        MainA.Month,
        MainA.RowNumber
    FROM #TestValue AS MainA
        WHERE MainA.Rownumber = 1

    UNION ALL

    SELECT 
        MainB.ItemID,
        MainB.A,
        MainB.B,
        (Sub.C - MainB.A) AS C,
        MainB.Month,
        MainB.RowNumber
    FROM #TestValue AS MainB
        INNER JOIN CTE_TestColumnC AS Sub
            ON MainB.RowNumber = Sub.RowNumber + 1
            AND MainB.ItemID = Sub.ItemID
)
SELECT 
    Main.ItemID,
    Main.A,
    Main.B,
    Main.C,
    Main.Month,
    Main.RowNumber
FROM CTE_TestColumnC AS Main
ORDER BY ItemID, Month, RowNumber

DROP TABLE #TestValue

In the query plan for your query the problem is shown in the table scan in the lower right corner. with this test data it is executed 8 times with a total of 64 rows returned:

enter image description here

The query plans for the query with a temporary table: enter image description here enter image description here

Mikael Eriksson
  • 136,425
  • 22
  • 210
  • 281
0

I hope I understood correctly what you are trying to do.
Here is my solution:

WITH DATA AS (
SELECT *, row_number() over (ORDER BY itemid) RN
FROM TestTable),
RECURSIVE AS (
   SELECT itemID, B-A AS C, RN
  FROM DATA
  WHERE RN = 1
  UNION ALL
  SELECT T1.itemID, t2.C - t1.A, t1.RN
  FROM DATA AS T1
  INNER JOIN
  RECURSIVE AS T2
  ON t1.RN = T2.Rn+1)
SELECT ItemID, C
FROM RECURSIVE

You can find the full example (with your data) here

Gidil
  • 4,137
  • 2
  • 34
  • 50
  • That's basically the same query I have, but your's doesn't PARTITION the row_number so it's not taking into account that different numbers are related to different items. I should specify that in my question. – jayEss Oct 10 '12 at 20:38