0

I have an address SCD type 2 table but sometimes information entered remain the same I want to write a query that if information is unchanged, retain the previous begin data and set end date to max e.g


OBJID          BEGDA      ENDDA     HASHROW_COL RK 
83022088    2012-03-30  2012-10-28  e1-ef-a9-36 1 
83022088    2012-10-29  2013-09-07  63-69-e5-25 2 
83022088    2013-09-08  2014-08-30  e1-ef-a9-36 3
83022088    2014-08-31  2016-11-26  e1-ef-a9-36 4
83022088    2016-11-27  9999-12-31  e1-ef-a9-36 5

Notice that from rows 3 – 5 the HASHROW_COL remain the same.


Desired result:
OBJID          BEGDA       ENDDA    HASHROW_COL RK 
83022088    2012-03-30  2012-10-28  e1-ef-a9-36 1 
83022088    2012-10-29  2013-09-07  63-69-e5-25 2
83022088    2013-09-08  9999-12-31  e1-ef-a9-36 3

Query so far

select a.objid, a.hashrow_col, 
case when a.objid <> b.objid then b.begda
    when a.hashrow_col = b.hashrow_col and (b.begda - interval '1' day <= a.endda) then 
    a.begda  end,
case when a.objid <> b.objid then b.endda
    when (a.hashrow_col = b.hashrow_col) and (b.begda - interval '1' day <= a.endda) and b.endda > a.endda  
    then b.endda 
    end,
from
(select objid, begda, endda, HASHROW_COL, 
from OTABLE )  a
inner join 
(select objid, begda, endda, HASHROW_COL, 
from OTABLE) b
on
a.objid = b.objid
where 
and a.objid = '83022088' 
order by a.OBJID, a.BEGDA,  a.HASHROW_COL;
Sandra
  • 3
  • 1
  • Are `HASHROW_COL `& `RK` calculated columns (to simplify the logic)? Do you want to fix the data in the SCD table or select those normalized rows? Then it's probably a simple `select normalize` – dnoeth Dec 07 '19 at 18:11

3 Answers3

0

This is a gaps-and-islands problem. In this case, I would use:

select objid, min(begda), max(endda), HASHROW_COL,
       row_number() over (partition by objid order by min(begda)) as ranking
from (select t.*,
             sum(case when prev_endda = begda - interval '1' day then 0 else 1 end) over (partition by objid order by begda) as grouping
      from (select t.*,
                   lag(endda) over (partition by objid, HASHROW_COL order by begda) as prev_endda
            from t
           ) t
      ) t
group by grouping, objid, HASHROW_COL;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

Assuming that the hashrow_col is the value which remains the same,(ie you got the values you need to group by) then all you would need is the min(begda) and max(endda)

select objid, min(begda) as begda, max(endda) as enddat, HASHROW_COL,
       min(rnk) 
  from OTABLE
group by objid,hashrow_col
George Joseph
  • 5,842
  • 10
  • 24
0

Here's a solution using the NORMALIZE function and the period data type.

We have to convert your start and end dates to a period. One gotcha with normalize to work is that consecutive end and start dates must overlap, that's why I've added 1 day to the end date (unless it's 9999-12-31').

select normalize on meets or overlaps objid,hashrow_col,duration from (
select
t.*,
period(begda, case when endda = '9999-12-31' then endda else endda +interval '1' DAY end) as duration
from
<your table> t) tt

Normalize and period are extremely powerful in the right circumstances.

Andrew
  • 8,445
  • 3
  • 28
  • 46