0

I need help with one task I'm trying to finish. I need to join my data into the smallest possible date ranges and retrieve MIN(P_MIN) and SUM(P_MAX) over objects (in column 'name') under one id.

|ID |NAME    |DATE_FROM |DATE_TO   |P_MAX|P_MIN|
|---|--------|----------|----------|-----|-----|
|1  |OBJECT 1|10/11/2021|10/10/2022|150  |20   |
|1  |OBJECT 1|10/10/2022|02/02/2023|200  |40   |
|1  |OBJECT 1|02/02/2023|18/06/2027|100  |70   |
|1  |OBJECT 2|10/11/2021|01/05/2022|300  |60   |
|1  |OBJECT 2|01/05/2022|01/12/2022|50   |40   |
|1  |OBJECT 2|01/12/2022|18/06/2027|350  |40   |

For above I'd like to obtain

|ID |DATE_FROM |DATE_TO   |SUM_P_MAX|P_MIN|
|---|----------|----------|---------|-----|
|1  |10/11/2021|01/05/2022|150+300  |20   |
|1  |01/05/2022|10/10/2022|50+150   |20   |
|1  |10/10/2022|01/12/2022|200+50   |40   |
|1  |01/12/2022|02/02/2023|350+200  |40   |
|1  |02/02/2023|18/06/2027|100+350  |40   |

"Tips"

  • MIN(date_from) and MAX(date_to) is always the same per object (column 'name').
  • MAX(date_to) can be NULL (that means object lasts to "infinity").
  • Per one object date_from is always the same as previous date_to.
  • There could be more than 2 objects under one id
  • So for MIN(date_from) I need to find MIN(date_to) and then move to next line (so find next min(date_from/to)) and so on. The issue might be that there are two MIN(date_from) and MAX(date_to)

I was trying to resolve it using MATCH_RECOGNIZE but I couldn't get expected results. I'm fixed with MATCH_RECOGNIZE but maybe there is a better way to resolve this?

Can anyone help?

Data:

   CREATE TABLE my_table (id         number
                         ,name       varchar2(100)
                         ,date_from  date
                         ,date_to    date
                         ,p_max      number
                         ,p_min      number);   
                         
 INSERT INTO my_table VALUES (1, 'OBJECT 1', TO_DATE('10/11/2021', 'DD/MM/YYYY'), TO_DATE('10/10/2022', 'DD/MM/YYYY'), 150, 20);
 INSERT INTO my_table VALUES (1, 'OBJECT 1', TO_DATE('10/10/2022', 'DD/MM/YYYY'), TO_DATE('02/02/2023', 'DD/MM/YYYY'), 200, 40);
 INSERT INTO my_table VALUES (1, 'OBJECT 1', TO_DATE('02/02/2023', 'DD/MM/YYYY'), TO_DATE('18/06/2027', 'DD/MM/YYYY'), 100, 70);
 INSERT INTO my_table VALUES (1, 'OBJECT 2', TO_DATE('10/11/2021', 'DD/MM/YYYY'), TO_DATE('01/05/2022', 'DD/MM/YYYY'), 300, 60);
 INSERT INTO my_table VALUES (1, 'OBJECT 2', TO_DATE('01/05/2022', 'DD/MM/YYYY'), TO_DATE('01/12/2022', 'DD/MM/YYYY'),  50, 40);
 INSERT INTO my_table VALUES (1, 'OBJECT 2', TO_DATE('01/12/2022', 'DD/MM/YYYY'), TO_DATE('18/06/2027', 'DD/MM/YYYY'), 350, 40);
MichalAndrzej
  • 77
  • 1
  • 11
  • If your data guarantees that `date_to = next(date_from)` and all objects per id has the same minimum/maximum `date_from/date_to`, then `match_recognize` seems superfluous. Can you please include different min/max dates that cause issues in your approach? – astentx Jan 26 '23 at 14:46
  • Yes, these conditions will always occur (just to note: date_to = next(date_from) is per object, not per id). Not sure which dates should I add, as I have issue with current ones :) I cannot obtain expected results with this set of data – MichalAndrzej Jan 26 '23 at 14:56

2 Answers2

3

To be tested on more data:

with alldates(id, dat) as (
    select id, date_from from my_table
    union 
    select id, date_to from my_table
)
, allintervals(id, date_from, date_to) as (
    select * from (
        select id, dat as date_from, lead(dat) over (partition by id order by dat) as date_to from alldates
    )
    where date_to is not null
)
select inter.id, inter.date_from, inter.date_to, sum(p_max) as sum_pmax, min(p_min) as min_pmin
from allintervals inter
join my_table t on inter.id = t.id and (
    inter.date_from between t.date_from and t.date_to-1
    or 
    inter.date_from between t.date_from and t.date_to-1
    )
group by inter.id, inter.date_from, inter.date_to
order by id, inter.date_from
;


1   10/11/21    01/05/22    450 20
1   01/05/22    10/10/22    200 20
1   10/10/22    01/12/22    250 40
1   01/12/22    02/02/23    550 40
1   02/02/23    18/06/27    450 40
p3consulting
  • 2,721
  • 2
  • 12
  • 10
  • This also works fine, but it's the same issue as with astentx answer. If MAX(date_to) is NULL (my bad, I forgot to mention about it) then it doesn't return correct values – MichalAndrzej Jan 27 '23 at 07:23
  • So do your home work: add a NVL with a far future date on date_to in CTE "alldates" and a NULLIF in the final SELECT. – p3consulting Jan 27 '23 at 08:37
2

You may use model clause to reference values of other rows and calculate such totals.

The idea behind this solution is to calculate new end dates for each interval (as long as each interval has no gaps a new end date is a next start date). And then calculate total for intersection of this interval with all original intervals.


select distinct
  date_from,
  to_ as date_to,
  sum_pmax,
  min_pmin
from my_table
model
  partition by (id)
  dimension by (
    date_from, date_to
  )
  measures (
    p_min, p_max,
    /*New result values*/
    0 as min_pmin, 0 as sum_pmax,
     /*New value of date_to*/
    date_from as to_,
    /*Auxiliary date_from to avoid cycle reference*/
    date_from as dummy_nocycle
  )
  rules update (
    /*Each new interval starts an new value of date_from,
    so it will be reused. The end of each interval is
    the next date_from*/

    /*Calculate new date_to as the nearest date_from
    of subsequent interval. Here we use a copy of date_from
    as a measure to avoid cyclic reference and be able to access it*/
    to_[any, any] = coalesce(min(dummy_nocycle)[date_from > cv(date_from), date_to > cv(date_from)], cv(date_to)),
    /*Then aggregate measures: calculate total for all intervals that intersect
    with the current one (with new date_to)*/
    sum_pmax[any, any] = sum(p_max)[date_from < to_[cv(), cv()], date_to > cv(date_from)],
    min_pmin[any, any] = min(p_min)[date_from < to_[cv(), cv()], date_to > cv(date_from)]
  )
order by 1, 2
DATE_FROM DATE_TO SUM_PMAX MIN_PMIN
2021-11-10 2022-05-01 450 20
2022-05-01 2022-10-10 200 20
2022-10-10 2022-12-01 250 40
2022-12-01 2023-02-02 550 40
2023-02-02 2027-06-18 450 40
astentx
  • 6,393
  • 2
  • 16
  • 25
  • This solution looks nice, but I forgot to mention that MAX(date_to) can be NULL (that means object lasts to "infinity"). In that case it wasnt working, I'll try to update yours code myself. If you know what to change, please share as I'm not familiar with *model* clause – MichalAndrzej Jan 27 '23 at 07:20
  • @MichalAndrzej Then you may use `dimension by (date_from, coalesce(date_to, date '9999-12-31') as date_to)` and use the above code – astentx Jan 27 '23 at 08:21