2

Is there a way to use the lead function such that I can get the next row where something has changed, as opposed it where it is the same?

In this example, the RowType can be 'in' or 'out', for each 'in' I need to know the next RowNumber where it has become 'out'. I have been playing with the lead function as it is really fast, however I haven't been able to get it working. I just need to do the following really, which is partition by a RowType which isn't the one in the current row.

select
RowNumber
,RowType --In this case I am only interested in RowType = 'In'
, Lead(RowNumber) 
    OVER (partition by "RowType = out" --This is the bit I am stuck on--
          order by  RowNumber ASC) as NextOutFlow
from table
order by RowNumber asc 

Thanks in advance for any help

SomeGuy30145
  • 85
  • 1
  • 7

2 Answers2

1

Rather than using lead() I would use an outer apply that returns the next row with type out for all rows with type in:

select RowNumber, RowType, nextOut 
from your_table t
outer apply (
  select min(RowNumber) as nextOut 
  from your_table 
  where RowNumber > t.RowNumber and RowType='Out'
) oa
where RowType = 'In'
order by RowNumber asc 

Given sample data like:

RowNumber   RowType
1           in
2           out
3           in
4           in
5           out
6           in

This would return:

RowNumber   RowType nextOut
1           in      2
3           in      5
4           in      5
6           in      NULL
jpw
  • 44,361
  • 6
  • 66
  • 86
  • Using an outer apply the process takes roughly 2 hours, whereas if the lead function could be made to work it might take only a few seconds (Using lead to just get the next row currently takes 4 on my dataset) – SomeGuy30145 Feb 25 '15 at 11:56
  • @SomeGuy30145 I didn't expect outer apply to take that long - may I ask how large is the data set you're working with? I couldn't think of a way to use `lead()` to do this though maybe it is possible. – jpw Feb 25 '15 at 14:28
  • Roughly 2 million rows – SomeGuy30145 Feb 25 '15 at 23:30
  • @SomeGuy30145 I made a small change to the query and in my test on a million rows with random in/out values the query now completes in 4 seconds. I did set up a backing index for `rownumber,rowtype` too but that shouldn't increase the performance that much. Please do try it now and see if it executes fast enough for you. – jpw Feb 26 '15 at 00:21
  • Instead of `MIN` inside the `APPLY` use `TOP(1)` and `ORDER BY`. Same result, but optimizer processes them differently. I've seen examples where `MIN` was much worse. There has to be an index on `(rowtype, rownumber)` (in this order). – Vladimir Baranov Feb 26 '15 at 03:56
  • 1
    @VladimirBaranov - thanks - Splitting the table into 2, one for Inflow and one for outflow and then applying an index on each of the new tables and then outer applying them together made a pretty efficient query - thw whole ETL took 2mins30 as opposed to 8 hours for just the last part. Thanks! – SomeGuy30145 Feb 26 '15 at 04:23
  • @jpw - the comment above applies to you too - Thanks – SomeGuy30145 Feb 26 '15 at 04:24
  • @VladimirBaranov My original query used top/order by in the outer apply (without a backing index though) and performed terribly which was why I changed to min. With the index both versions perform equally and fast in my test. Thanks for the index recommendation. – jpw Feb 26 '15 at 07:53
0

I think this will work
If you would use a bit field for in out you would get better performance

;with cte1 as 
(
  SELECT [inden], [OnOff]
       , lag([OnOff]) over (order by [inden]) as [lagOnOff]
    FROM [OnOff] 
),   cte2 as
(
  select [inden], [OnOff], [lagOnOff]
       , lead([inden]) over (order by [inden]) as [Leadinden]
    from cte1 
   where [OnOff] <> [lagOnOff]
      or [lagOnOff] is null
 )
 select [inden], [OnOff], [lagOnOff], [Leadinden]
   from cte2 
  where [OnOff] = 'true'

probably slower but if you have the right indexes may work

select t1.rowNum as 'rowNumIn', min(t2.rownum) as 'nextRowNumOut' 
  from tabel t1 
  join table t2 
    on t1.rowType = 'In' 
   and t2.rowType = 'Out'
   and t2.rowNum > t1.rowNum  
   and t2.rowNum < t1.rowNum + 1000  -- if you can constrain it 
 group by t1.rowNum
paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • the first one works really quickly, like faster than just calling the table which is amazing. However it only gets the first instance of 'In' and then it's subsequent 'out' , if there are to sequential 'in' - they should both get the same 'out' - the output of the other suggestion shows this. I changed the inout field to a bit like you suggested. – SomeGuy30145 Feb 26 '15 at 02:32
  • Ouch. If you need to jump multiple rows then why would you think lead / lag would be the correct approach? – paparazzo Feb 26 '15 at 14:16
  • I didn't know if it would be the correct approach at all, but if it worked it might have been really fast. – SomeGuy30145 Feb 27 '15 at 23:42