4

I'm trying to determine the best approach here in MSSQL 2008.

Here is my sample data

TransDate  Id     Active
-------------------------
1/18 1pm   5      1    
1/18 2pm   5      0    
1/18 3pm   5      Null    
1/18 4pm   5      1    
1/18 5pm   5      0    
1/18 6pm   5      Null

If grouped by Id and ordered by the TransDate, I want the last Non Null Value for the Active Column, and the MAX of TransDate

SELECT MAX(TransDate) AS TransDate, 
       Id,
       --LASTNonNull(Active) AS Active

Here would be the results:

TransDate  Id  Active
---------------------    
1/18 6pm   5   0

It would be like a Coalesce but over the rows, instead of two values/columns.

There would be many other columns that would also have this similiar method applied, so I really don't want to make a seperate join for each of the columns.

Any ideas?

OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
Bo Flexson
  • 481
  • 1
  • 6
  • 9

5 Answers5

5

I'd probably use a correlated sub query.

SELECT MAX(TransDate)             AS TransDate,
       Id,
       (SELECT TOP (1) Active
        FROM   T t2
        WHERE  t2.Id = t1.Id
               AND Active IS NOT NULL
        ORDER  BY TransDate DESC) AS Active
FROM   T t1
GROUP  BY Id  

A way without

SELECT
    Id,
    MAX(TransDate) AS TransDate,
    CAST(RIGHT(MAX(CONVERT(CHAR(23),TransDate,121) + CAST(Active AS CHAR(1))),1) AS BIT) AS Active,
    /*You can probably figure out a more efficient thing to 
    compare than the above depending on your data. e.g.*/
    CAST(MAX(DATEDIFF(SECOND,'19500101',TransDate) * CAST(10 AS BIGINT) + Active)%10  AS BIT) AS Active2
FROM T
GROUP BY Id

Or following the comments would cross apply work better for you?

WITH T (TransDate, Id, Active, SomeOtherColumn) AS
(
select GETDATE(), 5, 1, 'A' UNION ALL
select 1+GETDATE(), 5, 0, 'B' UNION ALL
select 2+GETDATE(), 5, null, 'C' UNION ALL
select 3+GETDATE(), 5, 1, 'D' UNION ALL
select 4+GETDATE(), 5, 0, 'E' UNION ALL
select 5+GETDATE(), 5, null,'F'

),
T1 AS
(
SELECT MAX(TransDate) AS TransDate,
       Id
FROM   T
GROUP  BY Id  
)
SELECT T1.TransDate,
       Id,
       CA.Active AS Active,
       CA.SomeOtherColumn AS SomeOtherColumn
FROM   T1
CROSS APPLY (SELECT TOP (1) Active, SomeOtherColumn
        FROM   T t2
        WHERE  t2.Id = T1.Id
               AND Active IS NOT NULL
        ORDER  BY TransDate DESC) CA
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • +1 for 2nd option. interesting approach. The CASE in the MAX is extraneous though – RichardTheKiwi Jan 18 '11 at 23:37
  • @cyberkiwi - Good Point! Fixed. – Martin Smith Jan 19 '11 at 00:26
  • This certainly works, Thanks everyone for the time and responses. I think in the end I may have to design the tables etc. differently. I don't think I am going to get the performance I need. The Active column in this example is one of many columns, so I don't really want up to 10 correlated sub queries in the design. If I was only going to query on 1 Id at a time it would be just fine. But some of the queries will be over a range of Ids. – Bo Flexson Jan 19 '11 at 14:50
  • @Bo - You could maybe use `CROSS APPLY` instead of the correlated sub query then. I only made it a correlated sub query as you were only returning one column and there would be no benefit of using `apply` here. – Martin Smith Jan 19 '11 at 14:52
  • @Martin - That would work perfect if the last non null value for each column was in the same row. If that were the case, the function could return the "Last Non Null Row" then CROSS APPLY that. I suppose, in the function, I could determine and figure out each Last Not Null column and return it as 1 row. That might work. Thanks for the suggestion. – Bo Flexson Jan 19 '11 at 15:06
  • @Martin - The cross apply actually works great for this sort of thing. In my actual case, I'll need to do 12 OUTER APPLYs to get each column. But it's against a Table Variable with a small number of rows. So it should be fast enough. – Bo Flexson Feb 01 '11 at 20:33
1

This example should help, using analytical functions Max() OVER and Row_Number() OVER

create table tww( transdate datetime, id int, active bit)
insert tww select GETDATE(), 5, 1
insert tww select 1+GETDATE(), 5, 0
insert tww select 2+GETDATE(), 5, null
insert tww select 3+GETDATE(), 5, 1
insert tww select 4+GETDATE(), 5, 0
insert tww select 5+GETDATE(), 5, null

select maxDate as Transdate, id, Active
from (
    select *,
        max(transdate) over (partition by id) maxDate,
        ROW_NUMBER() over (partition by id
                order by case when active is not null then 0 else 1 end, transdate desc) rn
    from tww
) x
where rn=1

Another option, quite expensive, would be doing it through XML. For educational purposes only

select
    ID = n.c.value('@id', 'int'),
    trandate = n.c.value('(data/transdate)[1]', 'datetime'),
    active = n.c.value('(data/active)[1]', 'bit')
from
(select xml=convert(xml,
    (select id [@id],
        (   select *
            from tww t
            where t.id=tww.id
            order by transdate desc
            for xml path('data'), type)
    from tww
    group by id
    for xml path('node'), root('root'), elements)
)) x cross apply xml.nodes('root/node') n(c)

It works on the principle that the XML generated has each record as a child node of the ID. Null columns have been omitted, so the first column found using xpath (child/columnname) is the first non-null value similar to COALESCE.

RichardTheKiwi
  • 105,798
  • 26
  • 196
  • 262
  • This goes through the table once only. However it will not scale for multiple columns, since they will generate different row numbers, so Martin's answer works better for the question – RichardTheKiwi Jan 18 '11 at 23:34
  • The XML option solves the issue of multiple columns, but XML is _slow_ and there may be some issues where xml-entities are involved in the data – RichardTheKiwi Jan 19 '11 at 00:34
0

You could use a subquery:

SELECT MAX(TransDate) AS TransDate
,      Id
,      (
       SELECT  TOP 1 t2.Active 
       FROM    YourTable t2
       WHERE   t1.id = t2.id
               and t2.Active is not null 
       ORDER BY 
               t2.TransDate desc
       )
FROM   YourTable t1
Andomar
  • 232,371
  • 49
  • 380
  • 404
0

I created a temp table named #temp to test my solution, and here is what I came up with:

transdate              id  active
1/1/2011 12:00:00 AM    5   1
1/2/2011 12:00:00 AM    5   0
1/3/2011 12:00:00 AM    5   null
1/4/2011 12:00:00 AM    5   1
1/5/2011 12:00:00 AM    5   0
1/6/2011 12:00:00 AM    5   null
1/1/2011 12:00:00 AM    6   2
1/2/2011 12:00:00 AM    6   3
1/3/2011 12:00:00 AM    6   null
1/4/2011 12:00:00 AM    6   2
1/5/2011 12:00:00 AM    6   null

This query...

select max(a.transdate) as transdate, a.id, (
  select top (1) b.active
  from #temp b
  where b.active is not null
  and b.id = a.id
  order by b.transdate desc
) as active
from #temp a
group by a.id

Returns these results.

transdate              id  active
1/6/2011 12:00:00 AM    5   0
1/5/2011 12:00:00 AM    6   2
Jason
  • 1,325
  • 6
  • 14
0

Assuming a table named "test1", how about using ROW_NUMBER, OVER and PARTITION BY?

SELECT transdate, id, active FROM 
    (SELECT transdate, ROW_NUMBER() OVER(PARTITION BY id ORDER BY transdate desc) AS rownumber, id, active  
     FROM test1
     WHERE active is not null) a 
WHERE a.rownumber = 1
Rich
  • 149
  • 1
  • 9
  • This won't return the overall max time as required (it returns the time for the last non null active row per group) – Martin Smith Jan 19 '11 at 00:24
  • Based on the GROUP BY id in the original question, I assumed one record per id was desired. For the single, most recent non null transaction, the query is simpler: select top 1 transdate, ROW_NUMBER() OVER(ORDER BY TransDate desc), test1.id, active from test1 where active is not null – Rich Jan 19 '11 at 00:30
  • The desired results have `1/18 6pm` which is `MAX(TransDate)` for that group. Your query would return `1/18 5pm`. – Martin Smith Jan 19 '11 at 10:19