1

Okay everyone,

Apologies in advance for the length. This one's actually kind of fun, though.

I wrote up a SQL script that I was semi-proud of yesterday because I thought it was quite clever. Turns out it gets ruined by performance issues, and I can't even test it because of that, so it may not even be doing what I think sigh.

This problem is best explained with an example:

Column A | Column B | Column C | Column D

   Heart   |      K      |  2/1/2013  |   3/1/2013
   Heart   |      K      |  2/1/2013  |   3/1/2013
   Heart   |      K      |  1/1/2013  |   3/1/2013
   Heart   |      K      |  2/1/2013  |   4/1/2013
   Spade   |      4      |  2/1/2013  |   3/1/2013
   Spade   |      3      |  2/1/2013  |   3/1/2013
   Club    |      4      |  2/1/2013  |   3/1/2013

With this table I need to: 1. Starting with the first, update the row with the data following it if the values in Column A match, 2. delete the second row after the update if there was a match, and 3. move on to the next row if there was no match and rerun the same process.

If there's a match, the higher row updates based on the following:

  1. Column A: Nothing
  2. Column B: If both values are the same, keep the value in one, otherwise write 'Multiple'
  3. Column C: Keep the earlier date between the two,
  4. Column D: Keep the later date between the two,

Then I delete the lower row.

My example should result in the following:

Column A | Column B | Column C | Column D

   Heart   |      K      |  1/1/2013  |  4/1/2013
   Spade   |   Multiple  |  2/1/2013  |   3/1/2013
   Club    |      4      |  2/1/2013  |   3/1/2013

To do all this I created two table variables, inserted the same data into both, and then cycled through the second (@ScheduleB) looking for matches to update the row in the first table (@ScheduleA). I then deleted the row below the row in @A (because it's the same as B). Finally, when there wasn't a match, I moved to the next row in @A to start the process over. At least that's what the code's supposed to do -- see below.

The problem is performance is TERRIBLE. I've considered using a Cursor, but don't know if the performance would help there.

Any suggestions?

Declare @ScheduleA Table
(
    RowNumber int,
    Period nvarchar(MAX),
    Program nvarchar(MAX),
    ControlAccount Nchar(50),
    WorkPackage Nchar(50),
    CAM Nchar(50),
    EVM Nchar(50),
    Duration int,
    BLStart datetime,
    BLFinish datetime
)

Declare @ScheduleB Table
    (
        RowNumber int,
        Period nvarchar(MAX),
        Program nvarchar(MAX),
        ControlAccount Nchar(50),
        WorkPackage Nchar(50),
        CAM Nchar(50),
        EVM Nchar(50),
        Duration int,
        BLStart datetime,
        BLFinish datetime
    )

Insert INTO @ScheduleA
Select ROW_NUMBER() OVER(order by workpackage desc) as [Row], Period, Program,       
ControlAccount, WorkPackage, CAM, EVM, Duration, BLStart, BLFinish
From ScheduleData 
where program = @Program and period = @Period

Insert INTO @ScheduleB
Select ROW_NUMBER() OVER(order by workpackage desc) as [Row], Period, Program,   
ControlAccount, WorkPackage, CAM, EVM, Duration, BLStart, BLFinish
From ScheduleData 
where program = @Program and period = @Period

declare @i int = 1
declare @j int = 2

--Create a loop for the second variable that counts up to the last row of the B table
While @j < (select MAX(ROWNUMBER) + 1 from @ScheduleB)
Begin
--if the tables match by WorkPackage THEN
IF ((select WorkPackage from @ScheduleA where RowNumber = @i) = 
    (select workpackage from @ScheduleB where RowNumber = @j))
    Begin 
        Update @ScheduleA 
 --Update the Schedule CAM, BLStart, BLFinish of     the A table (if necessary)
set CAM = 
    Case
               --Set values in @ScheduleA Column B based on logic
        End,

BLStart = 
        Case
               --Set values in @ScheduleA Column C  based on logic
    End,

BLFinish = 
    Case
               --Set values in @ScheduleA Column D based on logic
            End
    Where RowNumber = @i

Delete from @ScheduleA 
where RowNumber = @i + 1

set @j = @j + 1 --next row in B
End
ELSE 
set @i = @i + 1
END

EDIT: To clarify, column B is NOT an integer column, I was simply using this as an example because cards are pretty easily understood. I've since updated the column to include K's.

user
  • 1,261
  • 2
  • 21
  • 43
  • 1
    When I see a while loop with row-by-row processing in an SQL script, I find that it's almost always better done by writing a small program instead of a script.. – Eric Petroelje Jun 20 '13 at 19:10
  • Is that right? Okay! The application is written in VB.net -- any suggestion on how I might do it there? – user Jun 20 '13 at 19:21
  • I'd have to think about that, but regardless it would probably best be asked as a different question. – Eric Petroelje Jun 20 '13 at 19:44

1 Answers1

3

Based on your requirements I think a solution like this would work:

SELECT 
    [column a], 
    CASE WHEN MAX([column b]) <> MIN([column b]) THEN 'multiple' ELSE CAST(MAX([column b]) AS NVARCHAR(10)) END,
    MIN([column c]), 
    MAX([column d]) 
FROM Table
GROUP BY [column a]

EDIT:

SQL Fiddle

JodyT
  • 4,324
  • 2
  • 19
  • 31
  • Thanks, Jody, the only thing is that column B is not an integer column (I should've been clearer), so Column B doesn't have anything to do with Mins and Maxs. – user Jun 20 '13 at 19:27
  • `Min` and `Max` should still with with char/varchar data type. But this is just an example, if this actually gives the right result you can improve the Min/Max part. – JodyT Jun 20 '13 at 19:30
  • It also looks like there wouldn't be a test for the intermediate values, though, so only the Min and Max values are being checked when there's a match in Column A. – user Jun 20 '13 at 19:33
  • Your requirements state that it should write just return 'multiple' if there is more than 1 distinct value. It should not matter what the intermediate values are. – JodyT Jun 20 '13 at 19:36
  • I see what you mean. I'll give this a try. Thanks. – user Jun 20 '13 at 19:40
  • I added a SQL Fiddle example. Min and Max don't actually take the last and first row, if that was the case than the SQL Fiddle should return 4 for spade. It will return two distinct values if they exist. – JodyT Jun 20 '13 at 19:42
  • 1
    Based on @JodyT's logic, I think you could get the same results by using CASE WHEN COUNT(Distinct [columnb]) > 1 THEN 'Multiple' ELSE CAST(MAX([column b]) AS NVARCHAR(10)) END, – Stuart Ainsworth Jun 20 '13 at 22:51