LAG functions and NULLS

Question

How can I tell the LAG function to get the last "not null" value?

For example, see my table bellow where I have a few NULL values on column B and C. I'd like to fill the nulls with the last non-null value. I tried to do that by using the LAG function, like so:

case when B is null then lag (B) over (order by idx) else B end as B,

but that doesn't quite work when I have two or more nulls in a row (see the NULL value on column C row 3 - I'd like it to be 0.50 as the original).

Any idea how can I achieve that? (it doesn't have to be using the LAG function, any other ideas are welcome)

A few assumptions:

The number of rows is dynamic;
The first value will always be non-null;
Once I have a NULL, is NULL all up to the end - so I want to fill it with the latest value.

Thanks

Itzik Ben-Gan wrote a blog on a that problem: http://sqlmag.com/sql-server/how-previous-and-next-condition. Unfortunatley SQL Server doesn't support the `IGNORE NULLS` option in `LAST_VALUE`, then it's simple: `LAST_VALUE(B IGNORE NULLS) OVER (ORDER BY idx)`. — dnoeth, Apr 28 '16 at 07:09

score 8 · Answer 1 · edited Apr 25 '16 at 11:17

You can do it with outer apply operator:

select t.id,
       t1.colA,
       t2.colB,
       t3.colC 
from table t
outer apply(select top 1 colA from table where id <= t.id and colA is not null order by id desc) t1
outer apply(select top 1 colB from table where id <= t.id and colB is not null order by id desc) t2
outer apply(select top 1 colC from table where id <= t.id and colC is not null order by id desc) t3;

This will work, regardless of the number of nulls or null "islands". You may have values, then nulls, then again values, again nulls. It will still work.

If, however the assumption (in your question) holds:

Once I have a NULL, is NULL all up to the end - so I want to fill it with the latest value.

there is a more efficient solution. We only need to find the latest (when ordered by idx) values. Modifying the above query, removing the where id <= t.id from the subqueries:

select t.id,
       colA = coalesce(t.colA, t1.colA),
       colB = coalesce(t.colB, t2.colB),
       colC = coalesce(t.colC, t3.colC) 
from table t
outer apply (select top 1 colA from table 
             where colA is not null order by id desc) t1
outer apply (select top 1 colB from table 
             where colB is not null order by id desc) t2
outer apply (select top 1 colC from table 
             where colC is not null order by id desc) t3;

hey, thanks, but as I said, "The number of rows is dynamic" so how would that work with 5 rows? — Diego, Apr 25 '16 at 10:44
@diego - This will work regardless of the number of rows... But it would have a potentially exponential cost as the data set grows *(the cost of each sub query is higher for the 1000th row than it is for the 999th row)*, but it's certainly tidy for small data-sets. — MatBailie, Apr 25 '16 at 10:56
@GiorgiNakeuri Yes, you are right. I hadn't read the question carefully. — ypercubeᵀᴹ, Apr 25 '16 at 10:56
of course, I misread your query. It does seem to work, +1 thanks for your help — Diego, Apr 25 '16 at 14:24

MatBailie · Answer 2 · 2016-04-25T10:59:42.153

You could make a change to your ORDER BY, to force the NULLs to be first in your ordering, but that may be expensive...

lag(B) over (order by CASE WHEN B IS NULL THEN -1 ELSE idx END)

Or, use a sub-query to calculate the replacement value once. Possibly less expensive on larger sets, but very clunky.
- Relies on all the NULLs coming at the end
- The LAG doesn't rely on that

COALESCE(
    B,
    (
        SELECT
            sorted_not_null.B
        FROM
        (
            SELECT
                table.B,
                ROW_NUMBER() OVER (ORDER BY table.idx DESC)   AS row_id
            FROM
                table
            WHERE
                table.B IS NOT NULL
        )
           sorted_not_null
        WHERE
           sorted_not_null.row_id = 1
    )
)

(This should be faster on larger data-sets, than LAG or using OUTER APPLY with correlated sub-queries, simply because the value is calculated once. For tidiness, you could calculate and store the [last_known_value] for each column in variables, then just use COALESCE(A, @last_known_A), COALESCE(B, @last_known_B), etc)

+1 that does seem to work but my "table" is actually a big query that I don't really want to run more than once, which your solution would require. Thanks very much for the help — Diego, Apr 25 '16 at 14:26
@Diego - other than using LAG, every other answer here (and every approach I can think of) is going to have that problem. — MatBailie, Apr 25 '16 at 14:28

score 1 · Accepted Answer · answered Apr 25 '16 at 12:10

1

if it is null all the way up to the end then can take a short cut

declare @b varchar(20) = (select top 1 b from table where b is not null order by id desc);
declare @c varchar(20) = (select top 1 c from table where c is not null order by id desc); 
select is, isnull(b,@b) as b, insull(c,@c) as c 
from table;

answered Apr 25 '16 at 12:10

paparazzo

44,497
23
105
176

good aproach, I didn't want to declare variables so I ended up doing something like: case when B is null then (select top 1 B from where B is not null order by idx desc) else B end as B great Idea, thanks very much – Diego Apr 25 '16 at 14:32
I think variable are cleaner to read and it assures you that the query optimizer only does it once. – paparazzo Apr 25 '16 at 14:48

score 1 · Answer 4 · edited Mar 26 '21 at 10:50

1

Select max(diff) from(
Select 
    Case when lag(a) over (order by b) is not null
    Then (a -lag(a) over (order by b)) end as diff 
     From <tbl_name> where
    <relevant conditions>
    Order by b) k

Works fine in db visualizer.

edited Mar 26 '21 at 10:50

Syscall

19,327
10
37
52

answered Mar 26 '21 at 10:39

ritusmita

11
1

score -3 · Answer 5 · answered Apr 25 '16 at 10:57

-3

UPDATE table 
SET B = (@n := COALESCE(B , @n))
WHERE B is null;

answered Apr 25 '16 at 10:57

Adesh

34
4

This question is about `SQL Server`. – Giorgi Nakeuri Apr 25 '16 at 10:59
Right idea, but that's the MySQL notation. It *can* be done in SQL server, but not quite written that way. – MatBailie Apr 25 '16 at 11:00
UPDATE table SET B =(select last_value(B ignore nulls) over (order by idx) b from table) where B is null; – Adesh Apr 25 '16 at 11:08
@Adesh - Even that is not SQL Server syntax unfortunately. SQL Server does not support `IGNORE NULLS`. – MatBailie Apr 25 '16 at 11:40
if IGNORE NULLS not work then you can add NULL condition in where clause – Adesh Apr 25 '16 at 12:29

LAG functions and NULLS

5 Answers5