4

Suppose I have a table as follows:

Row     Volume
1       10000
2       8000
3       0.01
4       0
5       5000
6       0

Now, if I were to use the LAG() function in SQL Server, I believe the following table will result:

Row     Volume     LAG(Volume)
1       10000      NULL
2       8000       10000
3       0.01       8000
4       0          0.01
5       5000       0
6       0          5000

However, this is not exactly what I am looking for, so allow me to provide some background as to why I need the LAG(Volume) column. The reason for it is because small values, such as 0.01 or 0 in my example, are likely to be data processing errors, so I want to override them with the last non-0.01 or non-0 value and then update the table by a JOIN. I believe this requires me to first come up with the LAG(Volume) column, but I would need the LAG() function to ignore some values. In my example, they would be 0.01 and 0. I guess, one could also say that I would like the lagged value to depend on what the lagged value actually is. In other words, for my example, I would like the following table to be produced instead:

Row     Volume     LAG(Volume)
1       10000      NULL
2       8000       10000
3       0.01       8000
4       0          8000
5       5000       8000
6       0          5000

Note that, for Row 6, the lag should be 5000 (and not 8000) as, although its volume is 0, its lag is neither 0 nor 0.01.

Can this be done?

Perhaps if anyone has alternatives as to how I can update these values to the last non-0.01 or non-0 value without having to create an intermediate lag column, please let me know as well.

Any suggestions will be greatly appreciated!

Ethan Mark
  • 293
  • 1
  • 9
  • There is a parameter called OFFSET in LAG function which can be used to achieve this result. This parameter can be a column, subquery, or other expression that evaluates to a positive integer. – Madhukar Aug 21 '23 at 08:39
  • 1
    @Madhukar Hello! Would you be able to elaborate what you are suggesting in an answer? – Ethan Mark Aug 21 '23 at 09:01
  • May be using something like this. Please note that this is just an example and not the answer - LAG(volume, CASE WHEN volume IN (0.01,0) then 2 else 1 end) over (order by row) – Madhukar Aug 21 '23 at 09:06
  • @Madhukar - but there can be an arbitrary length sequence of rows to ignore - e.g. if preceding 10 have `volume IN (0.01,0)` then offset needs to be 11 - how can that `11` be calculated – Martin Smith Aug 21 '23 at 09:13

3 Answers3

4

On earlier versions you can use a concatenation approach to simulate IGNORE NULLS. 2016 DB Fiddle.

It uses MAX with a window of ORDER BY [Row] ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING

Assuming Row is a positive integer the value going into the MAX is a concatenation of Row and Volume constructed such that later Row numbers will be the MAX - but these are nulled out in the cases that you want to ignore and MAX automatically disregards those.

The Volume is then extracted out with SUBSTRING and CAST back to decimal - if you are not in fact using decimal(12,2) then change this to what you are using.

SELECT 
 *,
 CAST(SUBSTRING(MAX(RIGHT(CONCAT('0000000000', [Row]),10) + CASE WHEN Volume > 0.01 THEN CAST(Volume AS VARCHAR(20)) END) OVER (ORDER BY [Row] ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 11, 20) AS decimal(12,2))
FROM YourTable
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
2

Assuming you're on the latest version of SQL Server (as there's no mention you're not) you could use a CASE expression to NULL values lower than n (I assume 1) and then tell LAG to IGNORE NULLS:

LAG (CASE WHEN Volume > 1 THEN Volume END) IGNORE NULLS OVER (ORDER BY [Row]) AS LagVolumn

db<>fiddle

It appears the OP is on 2016, not 2022. I am leaving this here anyway, as it may be useful to others in the future.

Thom A
  • 88,727
  • 11
  • 45
  • 75
  • Hello! Thank you for your answer! Your query works when I exclude the `IGNORE NULLS` part. When I include it, SQL Server throws me an error - "The function 'LAG' must have an OVER clause." Is this because my SQL Server version is not the latest? I am using SQL Server Management Studio v17.9.1. – Ethan Mark Aug 21 '23 at 08:31
  • 1
    This is why I stated *"Assuming you're on the latest version of SQL Server"* @EthanMark . The version of SSMS is irrelevant to that. SSMS 17 supported SQL Server 2008-2017. – Thom A Aug 21 '23 at 08:32
  • My apologies. This is probably a stupid question, but how do I check what version of SQL Server am I on? – Ethan Mark Aug 21 '23 at 08:33
  • `PRINT @@VERSION` would be one way. – Thom A Aug 21 '23 at 08:33
  • I see. Thanks for that. The output says that I am on SQL Server 2016. Is that a problem? – Ethan Mark Aug 21 '23 at 08:34
  • 2
    It's not the latest version, so it won't support `IGNORE NULLS`, @EthanMark . Only 2022+ has `IGNORE NULL`s. You won't be able to achieve what you want with `LAG` specifically. – Thom A Aug 21 '23 at 08:35
2

One simple way would be to implement your logic with a correlation in a sub-query:

select *, (
  select top (1) volume from t t2 
  where t2.[row] < t.[row] and t2.volume > 1
  order by [row] desc
) as Lag_Volume
from t
order by Id;
Stu
  • 30,392
  • 6
  • 14
  • 33