-1

I need to group order together with crossing their date ranges only

scenario A.

  1. order 1, 1.3.2020-30.6.2020
  2. order 2, 1.5.2020-31.8.2020
  3. order 3, 31.7.2020-31.10.2020
  4. order 4, 31.7.2020-31.12.2020

so the output should be

  1. order 1, order 2
  2. order 2, order 3, order 4

order1,3,4 are not grouped because their ranges don't cross at all

scenario B.

same as above plus another order

  1. order 5, 1.1.2020-31.12.2020

so output will be

  1. order 1, order 2, order 5
  2. order 2, order 3, order 4, order 5

I tried Self Join to check which start date falls in that range. so in the range of order 1 falls only the start date of order 2 -> we have one group then in the range of order 2 fall both start dates of order 3 and 4 -> we have second group but then for order 3 falls start date of order 4 and opposite -> that will give another 2 groups but they are invalid because order 2 is crossing their date ranges as well and shoul be included as well and becuase there will be 3 douplicates we should display it just once as in the desired output but this approach will fail.

Thanks

APC
  • 144,005
  • 19
  • 170
  • 281
Pato
  • 153
  • 6
  • May be too difficult for me to understand but could you explain how the grouping done in the first case as `order 1, order 2` and `order 2, order 3, order 4` ? – Sujitmohanty30 Oct 10 '20 at 09:51
  • groups are done only for those orders wich date ranges are crossing, therefore group 1 are order1 and 2, group 2 are orders 2,3 and 4. in the group must be all order included with crossing date range. group with orders 1,2,3,4 will be invalid because date range or order 1 doesn't cross with date ranges or orders 3 and 4. similarly a group of order 3 and 4 will by invalid because order 2 must be included as well – Pato Oct 10 '20 at 10:23
  • 1
    Why is orders 3 & 4 an invalid group, if order 2 would have to be included, and why is order 2, 3 & 4 valid group when order 1 can be in the same group as order 2? I cannot reconcile the logic between these two statements. – MT0 Oct 10 '20 at 10:43
  • you can't have order 1 in the same group with orders 3 & 4 because of date ranges, order 1 ends on 30.6 but orders 3 & 4 start on 31.7, they don't cross each other. group of order 3 & 4 would be valid in case there is not another order (in this case order 2) which date range crossing both of them. I want only groups with orders whose date ranges overlaping each other. Therefore you can't have group 1,2,3,4 because not all of them crossing each other somehow and similarly goes for group of orders 3 & 4 because there will be missing order 2 which cross both of them. – Pato Oct 10 '20 at 11:03
  • Please in code questions give a [mre]--cut & paste & runnable code, including smallest representative example input as code; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. Give the least code you can that is code that you show is OK extended by code that you show is not OK. (Debugging fundamental.) For SQL that includes DBMS & DDL (including constraints & indexes) & input as code in table format. [ask] Pause work on the overall goal, chop code to the 1st expression not giving what you expect & say what you expect & why. – philipxy Oct 11 '20 at 22:06
  • This is not clear. Use enough words, sentences & references to parts of examples to clearly & fully say what you mean. To describe a result: Say enough that someone could go away & come back with a solution & a user knows how to use it. When giving a business relation(ship)/association or table (base or query result), say what a row in it states about the business situation in terms of its column values. – philipxy Oct 11 '20 at 22:08

2 Answers2

0

You can use MATCH_RECOGNIZE to find groups where the next value's start date is before, or equal to, the end date of all the previous values in the group. Then you can aggregate and exclude groups that would be entirely contained in another group:

WITH groups ( id, ids, start_date, end_date ) AS (
  SELECT id,
         LISTAGG( grp_id, ',' ) WITHIN GROUP ( ORDER BY start_date ),
         MIN( start_date ),
         MIN( end_date )
  FROM   (
    SELECT t.id,
           x.id AS grp_id,
           x.start_date,
           x.end_date
    FROM   table_name t
           INNER JOIN table_name x
           ON (
                   x.start_date >= t.start_date
               AND x.start_date <= t.end_date
              )
  )
  MATCH_RECOGNIZE (
    PARTITION BY id
    ORDER BY start_date
    MEASURES
      MATCH_NUMBER() AS mno
    ALL ROWS PER MATCH
    PATTERN ( FIRST_ROW GROUPED_ROWS* )
    DEFINE GROUPED_ROWS AS (
      GROUPED_ROWS.start_date <= MIN( end_date )
    )
  )
  WHERE mno = 1
  GROUP BY id
)
SELECT id,
       ids
FROM   groups g
WHERE  NOT EXISTS (
  SELECT 1
  FROM   groups x
  WHERE  g.ID <> x.ID
  AND    x.start_date <= g.start_date
  AND    g.end_date   <= x.end_date
)

Which for the sample data:

CREATE TABLE table_name ( id, start_date, end_date ) AS
SELECT 'order 1', DATE '2020-03-01', DATE '2020-06-30' FROM DUAL UNION ALL
SELECT 'order 2', DATE '2020-05-01', DATE '2020-08-31' FROM DUAL UNION ALL
SELECT 'order 3', DATE '2020-07-31', DATE '2020-10-31' FROM DUAL UNION ALL
SELECT 'order 4', DATE '2020-07-31', DATE '2020-12-31' FROM DUAL;

Outputs:

ID      | IDS                    
:------ | :----------------------
order 2 | order 2,order 3,order 4
order 1 | order 1,order 2        

I you then:

INSERT INTO table_name ( id, start_date, end_date )
VALUES ( 'order 5', DATE '2020-01-01', DATE '2020-12-31' );

The output would be:

ID      | IDS                    
:------ | :----------------------
order 2 | order 2,order 3,order 4
order 5 | order 5,order 1,order 2

db<>fiddle here

MT0
  • 143,790
  • 11
  • 59
  • 117
0

the result of MATCH_RECOGNIZE solution is incorrent because order 5 should be in both groups

I use some analitycal functions to solve this:

-- create table

Create table cross_dates (order_id number, start_date date , end_date date);

-- insert dates

insert into cross_dates values( 1, to_date('01.03.2020', 'dd.mm.yyyy'), to_date('30.06.2020', 'dd.mm.yyyy'));
insert into cross_dates values( 2, to_date('01.05.2020', 'dd.mm.yyyy'), to_date( '31.08.2020', 'dd.mm.yyyy'));
insert into cross_dates values( 3, to_date('31.07.2020', 'dd.mm.yyyy'), to_date( '31.08.2020', 'dd.mm.yyyy'));
insert into cross_dates values( 4, to_date('31.07.2020', 'dd.mm.yyyy'), to_date( '31.10.2020', 'dd.mm.yyyy'));
insert into cross_dates values( 5, to_date('01.01.2020', 'dd.mm.yyyy'), to_date( '31.12.2020', 'dd.mm.yyyy'));

-- SQL

select 'Order '|| min_order_id ||': ',  listagg( order_id, ',') within group (order by order_id)  list
from (
    select distinct min_order_id, order_id from (
        with  dates (cur_date, end_date, order_id, start_date) as (
              select start_date, end_date, order_id, start_date
              from cross_Dates
              union all
              select cur_date + 1, end_date, order_id,start_date
              from dates
              where cur_date < end_date )
    select d.order_id, 
           min(d.order_id) over(partition by greatest(d.start_date, cd.start_date)) min_order_id
    from dates d, cross_Dates cd
    where d.cur_date between cd.start_date and cd.end_date ))
group by min_order_id 
having count(*) > 1;

Result:

Order 1:    1,2,5
Order 2:    2,3,4,5

-- add new column and update old records

alter table cross_dates add (item varchar2(1)); 

update cross_dates set item = 'A'; 

-- insert new records B

insert into cross_dates values( 1, to_date('01.01.2020', 'dd.mm.yyyy'), to_date( '30.06.2020', 'dd.mm.yyyy'), 'B');
insert into cross_dates values( 1, to_date('01.07.2020', 'dd.mm.yyyy'), to_date( '31.12.2020', 'dd.mm.yyyy'), 'B');

My assumption:

  1. A and B are separate orders, not going in same groups even when crossing
  2. order 1 B - has two records as a continuations - in my understanding counts like one order : order 1 B 01.01.2020 - 21.12.2020

If my assumption are correct the SQL could look like this:

 select distinct min_order_id, order_id, item from (
           with  dates (cur_date, end_date, order_id, start_date, item) as (
              select start_date, end_date, order_id, start_date, item
              from cross_Dates             
              union all
              select cur_date + 1, end_date, order_id,start_date, item
              from dates
              where cur_date < end_date )
    select d.order_id,  d.item,
           min(d.order_id) over(partition by greatest(d.start_date, cd.start_date),d.item) min_order_id
    from dates d, cross_Dates cd
    where d.cur_date between cd.start_date and cd.end_date and d.item = cd.item )
    order by item, min_order_id; 

Result:

MIN_ORDER_ID ORDER_ID I


       1          1 A
       1          2 A
       1          5 A
       2          2 A
       2          3 A
       2          4 A
       2          5 A
       5          5 A
       1          1 B

If my assumption are not ok please provide me what result should look like i this case.

:)

  • nice job Monica and very interesting but let assume this modification. we can get rid of LISTAGG and simply list all orders row by row under the same group e.g Group 1 , 1 then Group 1 , 2 then Group 1 , 5 ......Group 2 , 2 then Group 2 , 3..... Assume there is one more column Item with values A and B. A has the orders as above and B has different orders , order 1 with 1.1.2020-30.6.2020 and again order 1 1.7.2020-31.12.2020(it is an extention therefore same ID). – Pato Oct 11 '20 at 11:16