1

I have data as below (first 3 columns). I would like to create the 4th column - newstatus.

Logic for the newstatus column is that

  1. for each pk2
  2. if column status is blank then pull value from the most recent row where status is either new or old.

Because of that row 3 gets value new which is from row 2 while row 9 gets value old from row 7. row 8 is ignored because it has value ignore

pk  status  pk2        newstatus
1           1   
2   new     1   
3           1          new
4   ignore  1   
5   ignore  1   
6           2   
7   old     2   
8   ignore  2   
9           2          old
10  new     2   
Dale K
  • 25,246
  • 15
  • 42
  • 71
user2543622
  • 5,760
  • 25
  • 91
  • 159

4 Answers4

3

Hmmm . . . This might be simplest using outer apply

select t.*,
       (case when status is null then t2.status end) as new_status
from t outer apply
     (select top (1) t2.*
      from t t2
      where t2.pk2 = t.pk2 and t2.status in ('old', 'new') and
            t2.pk1 < t.pk1
      order by t2.pk1 desc
     ) t2;

There is actually a way of doing this without subqueries . . . and this might be the most efficient method:

select t.*,
       (case when status is null and
                  max(case when status = 'old' then pk1 end) over (partition by pk2 order by pk1) >
                  max(case when status = 'new' then pk2 else 0 end order by pk1) over (partition by pk2)
             then 'old'
             when status is null and
                  max(case when status = 'new' then pk1 end) over (partition by pk2 order by pk1) >
                  max(case when status = 'old' then pk2 else 0 end) over (partition by pk2 order by pk1)
             then 'new'
      end) as new_status
from t;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • ` max(case when status = 'old' then pk1 end) over (partition by pk2) > max(case when status = 'new' then pk2 else 0 end) over (partition by pk2)` why do you have pk2 in the second max? shouldn't it be pk1? – user2543622 Aug 14 '20 at 20:52
  • also how would your code ensure that we are looking at only the previous rows and not all the rows? – user2543622 Aug 14 '20 at 23:50
  • @user2543622 . . . In the first query, the condition `t2.pk1 < t.pk1` does that. The second query left out the `order by clauses in window frames. Good catch. It now has the right logic. – Gordon Linoff Aug 15 '20 at 01:44
1

You can use window functions.

The idea is to build groups of rows with a cumulative count that increases every time a "new" or "old" status is met. You can then use first_value() to access the corresponding status whenever needed.

select 
    t.*,
    case when status is null and grp > 0
        then first_value(status) over(partition by pk2, grp order by pk1)
        else status
    end as new_status
from (
    select 
        t.*,
        sum(case when status in ('old', 'new') then 1 else 0 end)
            over(partition by pk2 order by pk1) grp
    from mytable t
) t
GMB
  • 216,147
  • 25
  • 84
  • 135
1

You can Use this script

WITH OuterT AS
(
SELECT *,ROW_NUMBER() OVER(PARTITION BY pk2 ORDER BY pk1) num FROM tbl
)
SELECT pk1,[Status],pk2,
(CASE WHEN num=1 or [Status] !='' THEN ''
WHEN num !=1 THEN
(SELECT TOP 1 innerT.[Status] FROM OuterT innerT WHERE innerT.pk2 =OuterT.pk2 and ([status] ='new' or [status] ='old') and num != 1 and innerT.pk1 < OuterT.pk1  ORDER BY pk1 DESC)
END) newstatus
FROM OuterT

How it works:

I used common_table_expression that its name is OuterT

WITH OuterT AS
(
SELECT *,ROW_NUMBER() OVER(PARTITION BY pk2 ORDER BY pk1) num FROM tbl
)

And I used ROW_NUMBER that its name is num

Inner result of OuterT is :

inner result of outerT

And then I used a CASE that Include your logic base on OuterT

(CASE WHEN num=1 or [Status] !='' THEN ''
WHEN num !=1 THEN
(SELECT TOP 1 innerT.[Status] FROM OuterT innerT WHERE innerT.pk2 =OuterT.pk2 and ([status] ='new' or [status] ='old') and num != 1 and innerT.pk1 < OuterT.pk1  ORDER BY pk1 DESC)
END) newstatus

And final result is :

enter image description here

Note: if you want to use script just replace “tbl” with your table name

Amirhossein Yari
  • 2,054
  • 3
  • 26
  • 38
0

This is old school, no windowing function and no OUTER APPLY.

;with max_prior_pk_cte as (
    select tt.pk, tt.pk2,
           max(ttt.pk) max_pk
    from
      #testTable tt
     join
      #testTable ttt on tt.pk2=ttt.pk2
                        and tt.pk>ttt.pk
    where tt.[status] is null
          and ttt.[status] in ('old', 'new')
    group by tt.pk, tt.pk2)
select
  t.*, t_prior.[status] new_status
from
  #testTable t
 left join
  max_prior_pk_cte mppc on t.pk=mppc.pk
 left join
  #testTable t_prior on mppc.max_pk=t_prior.pk;

Here's some sample data:

drop table if exists #testTable;
go
create table #testTable(
  pk        int unique not null,
  [status]  varchar(20),
  pk2       int not null);

insert into #testTable(pk, [status], pk2) values
(1, null, 1),
(2, 'new', 1),   
(3, null, 1),
(4, 'ignore', 1),
(5, 'ignore', 1),
(6, null, 2),   
(7, 'old', 2),   
(8, 'ignore', 2),   
(9, null, 2),
(10, 'new', 2);   
SteveC
  • 5,955
  • 2
  • 11
  • 24