T-SQL: Best way to replace NULL with most recent non-null value?

Question

Assume I have this table:

+----+-------+
| id | value |
+----+-------+
|  1 |     5 |
|  2 |     4 |
|  3 |     1 |
|  4 |  NULL |
|  5 |  NULL |
|  6 |    14 |
|  7 |  NULL |
|  8 |     0 |
|  9 |     3 |
| 10 |  NULL |
+----+-------+

I want to write a query that will replace any NULL value with the last value in the table that was not null in that column.

I want this result:

+----+-------+
| id | value |
+----+-------+
|  1 |     5 |
|  2 |     4 |
|  3 |     1 |
|  4 |     1 |
|  5 |     1 |
|  6 |    14 |
|  7 |    14 |
|  8 |     0 |
|  9 |     3 |
| 10 |     3 |
+----+-------+

If no previous value existed, then NULL is OK. Ideally, this should be able to work even with an ORDER BY. So for example, if I ORDER BY [id] DESC:

+----+-------+
| id | value |
+----+-------+
| 10 |  NULL |
|  9 |     3 |
|  8 |     0 |
|  7 |     0 |
|  6 |    14 |
|  5 |    14 |
|  4 |    14 |
|  3 |     1 |
|  2 |     4 |
|  1 |     5 |
+----+-------+

Or even better if I ORDER BY [value] DESC:

+----+-------+
| id | value |
+----+-------+
|  6 |    14 |
|  1 |     5 |
|  2 |     4 |
|  9 |     3 |
|  3 |     1 |
|  8 |     0 |
|  4 |     0 |
|  5 |     0 |
|  7 |     0 |
| 10 |     0 |
+----+-------+

I think this might involve some kind of analytic function - somehow partitioning over the value column - but I'm not sure where to look.

score 3 · Answer 1 · answered Jan 05 '18 at 02:59

You can use a running sum to set groups and use max to fill in the null values.

select id,max(value) over(partition by grp) as value
from (select id,value,sum(case when value is not null then 1 else 0 end) over(order by id) as grp
      from tbl
     ) t

Change the over() clause to order by value desc to get the second result in the question.

score 3 · Answer 2 · edited Jan 05 '18 at 04:03

The best way has been covered by Itzik Ben-Gan here:The Last non NULL Puzzle

Below is a solution which for 10 million rows and completes around in 20 seconds on my system

SELECT
  id,
  value1,
  CAST(
  SUBSTRING(
  MAX(CAST(id AS binary(4)) + CAST(value1 AS binary(4)))
  OVER (ORDER BY id
  ROWS UNBOUNDED PRECEDING),
  5, 4)
  AS int) AS lastval
FROM dbo.T1;

This solution assumes your id column is indexed

score 2 · Answer 3 · answered Jan 05 '18 at 09:22

You can also try using correlated subquery

select id,
       case when value is not null then value else
       (select top 1 value from table 
        where id < t.id and value is not null  order by id desc) end value  
from table t

Result :

score 1 · Answer 4 · answered Jan 05 '18 at 03:04

If the NULLs are scattered I use a WHILE loop to fill them in

However if the NULLs are in longer consecutive strings there are faster ways to do it.

So here's one approach:

First find a record that we want to update. It has NULL in this record and no NULL in the prior record

SELECT C.VALUE, N.ID  
FROM TABLE C
INNER JOIN TABLE N
ON C.ID + 1 = N.ID
WHERE C.VALUE IS NOT NULL
AND N.VALUE IS NULL;

Use that to update: (bit hazy on this syntax but you get the idea)

UPDATE N
SET VALUE = C.Value
FROM TABLE C
INNER JOIN TABLE N
ON C.ID + 1 = N.ID
WHERE C.VALUE IS NOT NULL
AND N.VALUE IS NULL;

.. now just keep doing it till you run out of rows

-- This is needed to set @@ROWCOUNT to non zero
SELECT 1;


WHILE @@ROWCOUNT <> 0
BEGIN 

UPDATE N
SET VALUE = C.Value
FROM TABLE C
INNER JOIN TABLE N
ON C.ID + 1 = N.ID
WHERE C.VALUE IS NOT NULL
AND N.VALUE IS NULL;

END

The other way is to use a similiar query to get a range of id's to update. This works much faster if your NULLS are usually against consecutive id's

score 1 · Answer 5 · answered Jan 05 '18 at 04:31

Here is the one simple approach using OUTER APPLY

CREATE TABLE #table(id INT, value INT)
INSERT INTO #table VALUES 
(1,5),
(2,4),
(3,1),
(4,NULL),
(5,NULL),
(6,14),
(7,NULL),
(8,0),
(9,3),
(10,NULL)

SELECT t.id, ISNULL(t.value, t3.value) value
FROM #table t
OUTER APPLY(SELECT id FROM #table WHERE id = t.id AND VALUE IS NULL) t2
OUTER APPLY(SELECT TOP 1 value 
            FROM #table WHERE id <= t2.id AND VALUE IS NOT NULL ORDER BY id DESC) t3

OUTPUT:

score 1 · Answer 6 · answered Jan 05 '18 at 05:48

Using this sample data:

if object_id('tempdb..#t1') is not null drop table #t1;
create table #t1 (id int primary key, [value] int null);
insert #t1 values(1,5),(2,4),(3,1),(4,NULL),(5,NULL),(6,14),(7,NULL),(8,0),(9,3),(10,NULL);

I came up with:

with x(id, [value], grouper) as (
select *, row_number() over (order by id)-sum(iif([value] is null,1,0)) over (order by id)
from #t1)
select id, min([value]) over (partition by grouper)
from x;

I noticed, however, that Vamsi Prabhala beat me to it... My solution is identical to what he posted. (arghhhh!). So I thought I'd try a recursive solution. Here's a pretty efficient use of a recursive cte (provided that ID is indexed):

with sorted as (select *, seqid = row_number() over (order by id) from #t1),
firstRecord as (select top(1) * from #t1 order by id),
prev as
(
  select t.id, t.[value], lastid = 1, lastvalue = null
  from sorted t
  where t.id = 1
  union all
  select t2.id, t2.[value], lastid+1, isnull(prev.[value],lastvalue)
  from sorted t2
  join prev on t2.id = prev.lastid+1
)
select id, [value]=isnull([value],lastvalue)--, *
from prev;

Normally I don't like recursive cte's (rCte for short) but in this case it offered an elegant solution and was faster than using the window aggregate function (sum over, min over...). Note the execution plans, the rcte on the bottom. The rCTE get's it done with two index seeks, one of which is for just one row. Unlike the window aggregate solution, the rcte does not require a sort. Running this with statistics io on; the rcte produces much less IO.

All this said, don't use either of these solutions, What the TheGameiswar posted will perform the best by far. His solution on a properly indexed id column would be lightening fast.

score 1 · Answer 7 · answered Jan 05 '18 at 10:28

Following UPDATE statement can be used, please test it before use

update #table
set value = newvalue
from (
    select 
    s.id, s.value,
    (select top 1 t.value from #table t where t.id <= s.id and t.value is not null order by t.id desc) as newvalue
    from #table S
) u
where #table.id = u.id and #table.value is null

score 0 · Answer 8 · answered Jan 05 '18 at 06:24

stop worrying..here's the answer for you :)

SELECT *
INTO   #TempIsNOtNull
FROM   YourTable
WHERE  value IS NOT NULL


SELECT *
INTO   #TempIsNull
FROM   YourTable
WHERE  value IS NULL


UPDATE YourTable
SEt           YourTable.value      =      UpdateDtls.value
FROM   YourTable
JOIN   (
          SELECT OuterTab1.id,
                       #TempIsNOtNull.value
          FROM   #TempIsNull  OuterTab1
          CROSS  JOIN #TempIsNOtNull
          WHERE  OuterTab1.id - #TempIsNOtNull.id > 0
                 AND (OuterTab1.id - #TempIsNOtNull.id)  = ( SELECT  TOP 1 
    OuterTab1.id - #TempIsNOtNull.id

   FROM       #TempIsNull  InnerTab

 CROSS       JOIN #TempIsNOtNull

 WHERE       OuterTab1.id - #TempIsNOtNull.id > 0

 AND OuterTab1.id     =      InnerTab.id

 ORDER BY (OuterTab1.id - #TempIsNOtNull.id) ASC) ) AS UpdateDtls
 ON     (YourTable.id   =  UpdateDtls.id)

T-SQL: Best way to replace NULL with most recent non-null value?

8 Answers8

Linked