TSQL - LEAD for Next Different Row

Question

Is there a way to use the lead function such that I can get the next row where something has changed, as opposed it where it is the same?

In this example, the RowType can be 'in' or 'out', for each 'in' I need to know the next RowNumber where it has become 'out'. I have been playing with the lead function as it is really fast, however I haven't been able to get it working. I just need to do the following really, which is partition by a RowType which isn't the one in the current row.

select
RowNumber
,RowType --In this case I am only interested in RowType = 'In'
, Lead(RowNumber) 
    OVER (partition by "RowType = out" --This is the bit I am stuck on--
          order by  RowNumber ASC) as NextOutFlow
from table
order by RowNumber asc

Thanks in advance for any help

jpw · Answer 1 · 2015-02-26T00:18:19.097

1

Rather than using lead() I would use an outer apply that returns the next row with type out for all rows with type in:

select RowNumber, RowType, nextOut 
from your_table t
outer apply (
  select min(RowNumber) as nextOut 
  from your_table 
  where RowNumber > t.RowNumber and RowType='Out'
) oa
where RowType = 'In'
order by RowNumber asc

Given sample data like:

RowNumber   RowType
1           in
2           out
3           in
4           in
5           out
6           in

This would return:

RowNumber   RowType nextOut
1           in      2
3           in      5
4           in      5
6           in      NULL

edited Feb 26 '15 at 00:18

answered Feb 25 '15 at 11:07

jpw

44,361
6
66
86

Using an outer apply the process takes roughly 2 hours, whereas if the lead function could be made to work it might take only a few seconds (Using lead to just get the next row currently takes 4 on my dataset) – SomeGuy30145 Feb 25 '15 at 11:56
@SomeGuy30145 I didn't expect outer apply to take that long - may I ask how large is the data set you're working with? I couldn't think of a way to use `lead()` to do this though maybe it is possible. – jpw Feb 25 '15 at 14:28
Roughly 2 million rows – SomeGuy30145 Feb 25 '15 at 23:30
@SomeGuy30145 I made a small change to the query and in my test on a million rows with random in/out values the query now completes in 4 seconds. I did set up a backing index for `rownumber,rowtype` too but that shouldn't increase the performance that much. Please do try it now and see if it executes fast enough for you. – jpw Feb 26 '15 at 00:21
Instead of `MIN` inside the `APPLY` use `TOP(1)` and `ORDER BY`. Same result, but optimizer processes them differently. I've seen examples where `MIN` was much worse. There has to be an index on `(rowtype, rownumber)` (in this order). – Vladimir Baranov Feb 26 '15 at 03:56
1

@VladimirBaranov - thanks - Splitting the table into 2, one for Inflow and one for outflow and then applying an index on each of the new tables and then outer applying them together made a pretty efficient query - thw whole ETL took 2mins30 as opposed to 8 hours for just the last part. Thanks! – SomeGuy30145 Feb 26 '15 at 04:23
@jpw - the comment above applies to you too - Thanks – SomeGuy30145 Feb 26 '15 at 04:24
@VladimirBaranov My original query used top/order by in the outer apply (without a backing index though) and performed terribly which was why I changed to min. With the index both versions perform equally and fast in my test. Thanks for the index recommendation. – jpw Feb 26 '15 at 07:53

paparazzo · Answer 2 · 2015-02-25T17:51:50.433

0

I think this will work
If you would use a bit field for in out you would get better performance

;with cte1 as 
(
  SELECT [inden], [OnOff]
       , lag([OnOff]) over (order by [inden]) as [lagOnOff]
    FROM [OnOff] 
),   cte2 as
(
  select [inden], [OnOff], [lagOnOff]
       , lead([inden]) over (order by [inden]) as [Leadinden]
    from cte1 
   where [OnOff] <> [lagOnOff]
      or [lagOnOff] is null
 )
 select [inden], [OnOff], [lagOnOff], [Leadinden]
   from cte2 
  where [OnOff] = 'true'

probably slower but if you have the right indexes may work

select t1.rowNum as 'rowNumIn', min(t2.rownum) as 'nextRowNumOut' 
  from tabel t1 
  join table t2 
    on t1.rowType = 'In' 
   and t2.rowType = 'Out'
   and t2.rowNum > t1.rowNum  
   and t2.rowNum < t1.rowNum + 1000  -- if you can constrain it 
 group by t1.rowNum

edited Feb 25 '15 at 17:51

answered Feb 25 '15 at 15:20

paparazzo

44,497
23
105
176

the first one works really quickly, like faster than just calling the table which is amazing. However it only gets the first instance of 'In' and then it's subsequent 'out' , if there are to sequential 'in' - they should both get the same 'out' - the output of the other suggestion shows this. I changed the inout field to a bit like you suggested. – SomeGuy30145 Feb 26 '15 at 02:32
Ouch. If you need to jump multiple rows then why would you think lead / lag would be the correct approach? – paparazzo Feb 26 '15 at 14:16
I didn't know if it would be the correct approach at all, but if it worked it might have been really fast. – SomeGuy30145 Feb 27 '15 at 23:42

TSQL - LEAD for Next Different Row

2 Answers2