0

I have the following table (myt) of events that are observed over a period of time (Start_date : End_Date) for different persons (A,B,C...)

Events can be observed several times for each person.

In the case two events for a given person do overlap, they are joined (JOINED = 1). This step is performed using a simple lag.

case when lag(End_Date) over (partition by Person order by Start_Date) > Start_Date then 1 else NULL end as Joined

Events can be :

  • disjoined for a person (person A has two distinct events a1 and a2);
  • joined for a person (person B has two events, grouped as b1);
  • or both... (person C has her two first events grouped as c1, then a time gap, then two events grouped as c2)

What I would like to have is an automation of column GROUPED (here built manually).

This case is a simplified version of :

  • about 50.000 persons,
  • up to 70 events per persons
  • events grouped in 1 to 20 event bulks per person.

The issue, I think, is that this logic is a 'line by line' approach: lines might be grouped with the previous line based on lagged date. But I can't find a way to group the data 'as long as' the time gap between events is null.

I explored dense_rank, first_value, and row_number, but I can't find the good way to define the partition and group by to begin with.

Is there a way to do this in (Oracle) SQL ?

(StackOverflow points me to this similar question : SQL Partition By alternating groups of rows , but the case of person C, with repeating subgroups gives an additional difficulty)

 -- The table I have
create table myt(
    Obs number ,
    Person VARCHAR(1),
    Start_Date date,
    End_Date date
  );
insert into myt values(1,'A',date'2015-05-25',date'2015-06-13');
insert into myt values(2,'A',date'2017-03-09',date'2017-08-19');
insert into myt values(3,'B',date'2015-05-25',date'2015-06-13');
insert into myt values(4,'B',date'2015-06-02',date'2015-07-22');
insert into myt values(5,'C',date'2015-06-02',date'2015-07-22');
insert into myt values(6,'C',date'2015-07-14',date'2015-09-04');
insert into myt values(7,'C',date'2017-03-09',date'2017-08-19');
insert into myt values(8,'C',date'2017-06-01',date'2017-11-11');
 
 -- the results I would like to obtain (built here manually)
create table results(
    Obs number ,
    GROUPED VARCHAR(2)
  );
insert into results values(1,'a1');
insert into results values(2,'a2');
insert into results values(3,'b1');
insert into results values(4,'b1');
insert into results values(5,'c1');
insert into results values(6,'c1');
insert into results values(7,'c2');
insert into results values(8,'c2');

 -- the table with both
select 
p.OBS
, Person 
, Start_Date
, End_Date
, lag(End_Date) over (partition by Person order by Start_Date) as L_End_Date
, case when lag(End_Date) over (partition by Person order by Start_Date) > Start_Date then 1 else NULL end as Joined
, GROUPED
FROM myt p
LEFT JOIN results r
on p.OBS=r.OBS
ORDER BY Person , Start_Date
Sylvain
  • 35
  • 4

0 Answers0