1

I need to select rows with start and end dates and if some dates overlap check if the rest of the row is the same then merge the rows with min(startdate) and max(startdate)? I think I first need to group rows that overlap and then I can do a group by that.

Each row have an ID, start_date, end_date and some data. Some rows date range overlaps and some don't, I want to merge those that have the same ID, data and have a date range that overlaps.

When tried only the two top rows with the suggested answer I got the three rows that are last in the question.

id      valid_from  valid_to
900101  06-MAY-13   02-FEB-14
900101  03-FEB-14   23-JUL-14
900102  01-JAN-10   01-DEC-10
900102  01-JAN-11   23-JAN-13
900102  01-AUG-11   23-JAN-15
900102  01-SEP-11   15-DEC-14

After a run it should be:

id      valid_from  valid_to
900101  06-MAY-13   02-FEB-14
900101  03-FEB-14   23-JUL-14
900102  01-JAN-10   01-DEC-10
900102  01-JAN-11   23-JAN-15  

Where the three bottom rows merged.

With only the two top rows the suggested code returned this:

900101  06-MAY-13   02-FEB-14 
900101  06-MAY-13   23-JUL-14 
900101  03-FEB-14   23-JUL-14
Anteus
  • 21
  • 4
  • 1
    What are your tables and example data (please include DDL and DML statements)? What is your expected result set? What have you tried? – MT0 Mar 01 '16 at 15:23
  • You might get some ideas in [tag:gaps-and-islands]. – Alex Poole Mar 01 '16 at 15:39
  • think we need more info. so it is more than select min(start_dt), max(end_date), col_one, col_two from mytable group by col_one, col_two – Bryan Dellinger Mar 01 '16 at 16:07
  • Thanks for the heads up! I added more info and I will check out gaps-and-islands! – Anteus Mar 01 '16 at 16:20
  • so what happens when you do select min(start_date), max(end_date), id, data from mytable group by id, data ? – Bryan Dellinger Mar 01 '16 at 16:47
  • I upvoted this question because it is fine even without the details added in the second paragraph. The problem is general and has substantial academic literature around it. – Paul A Jungwirth Mar 01 '16 at 16:49
  • Doesn't that suggest that the question doesn't show any research effort? – Alex Poole Mar 01 '16 at 18:00
  • @user2744722 rows with the same id maybe doesn't contain the same data and/or not be valid at the same time and should not be merged so I guess that doesn't work. If I also had a row with dates like from: 3-FEB-14 to: 26-AUG-15 then it shouldn't merge those two. – Anteus Mar 01 '16 at 19:55
  • Can you add some more data showing various combinations of standalone and overlapping scenarios, and the result you expect from that data? – Alex Poole Mar 01 '16 at 20:01
  • @AlexPoole Just added some! Will add more as well! – Anteus Mar 01 '16 at 20:17
  • I started to write up an answer and then realised it's very similar to [an old answer](http://stackoverflow.com/a/22074123/266304) - you should just be able to just change the table and column names. If that works I'll close this as a duplicate. – Alex Poole Mar 01 '16 at 20:25
  • 1
    @AlexPoole I didn't say that literature was easy to find. :-) Perhaps this question will make it easier. – Paul A Jungwirth Mar 01 '16 at 22:31

2 Answers2

0

If you are writing tables with start_date and end_date, you would probably benefit from reading Developing Time-Oriented Database Applications in SQL by Richard Snodgrass. People have studied questions like yours for 20+ years, and this is a great intro to the academic literature for working programmers. You can get a used copy on Amazon or read it for free online (in the "Books" section).

Your specific question is addressed in section 6.5. For instance given this table:

   ssn    |  pcn   | start_date | end_date
----------+--------+------------+-----------
111223333 | 120033 | 1996-01-01 | 1996-06-01
111223333 | 120033 | 1996-04-01 | 1996-10-01
111223333 | 120033 | 1996-04-01 | 1996-10-01
111223333 | 120033 | 1996-10-01 | 1998-01-01
111223333 | 120033 | 1997-12-01 | 1998-01-01

You can merge adjacent/overlapping time periods and remove duplicates with this SQL (slightly adapted from the book to use a CTE instead of a temp table):

WITH temp AS (
  SELECT ssn, pcn, start_date, end_date
  FROM   incumbents
)
SELECT DISTINCT f.ssn, f.pcn, f.start_date, l.end_date
FROM   temp AS f,
       temp AS l
WHERE  f.start_date < l.end_date
AND    f.ssn = l.ssn
AND    f.pcn = l.pcn
AND NOT EXISTS (SELECT 1
                FROM   temp AS m
                WHERE  m.ssn = f.ssn
                AND    m.pcn = f.pcn
                AND    f.end_date < m.start_date
                AND    m.start_date < l.start_date
                AND NOT EXISTS (SELECT 1
                                FROM   temp AS t1
                                WHERE  t1.ssn = f.ssn
                                AND    t1.pcn = f.pcn
                                AND    t1.start_date < m.start_date
                                AND    m.start_date <= t1.end_date))
AND NOT EXISTS (SELECT 1
                FROM   temp AS t2
                WHERE  t2.ssn = f.ssn
                AND    t2.pcn = f.pcn
                AND    ((t2.start_date < f.start_date
                         AND f.start_date <= t2.end_date)
                OR      (t2.start_date <= l.end_date
                         AND l.end_date < t2.end_date)))

That is in the Postgres dialect but I'm sure you can adapt it to Oracle (or any other database). Also, you should change ssn and pcn to whatever key you're using (possibly id, as long as the same id is allowed to appear in multiple records at different times).

Paul A Jungwirth
  • 23,504
  • 14
  • 74
  • 93
  • Thank you for the book recommendation and code! But when I try it with the two rows I have added to the question I get three rows back (also added). – Anteus Mar 01 '16 at 19:40
  • I originally left off the bottom `NOT EXISTS` clause, so I would make sure you are copying the whole query after refreshing your browser. – Paul A Jungwirth Mar 01 '16 at 22:27
  • I did test running this myself in a Postgres database though, and it appears to give the correct results for me. – Paul A Jungwirth Mar 01 '16 at 22:28
0

This will work in Oracle using hierarchical queries and will query the original data only twice

WITH d AS
 (
  --
  SELECT DATE '2016-01-01' effective_start_date, DATE '2016-02-01' - 1 effective_end_date, 1 contract_id
    FROM dual
  UNION ALL --
  SELECT DATE '2016-02-01', DATE '2016-04-01' - 1, 1
    FROM dual
  UNION ALL --
  SELECT DATE '2016-04-01', DATE '2016-04-30', 1
    FROM dual
  UNION ALL --

  SELECT DATE '2016-06-01', DATE '2016-07-01' - 1, 1
    FROM dual
  UNION ALL -- gap
  SELECT DATE '2016-07-01' + 1, DATE '2016-07-31', 1
    FROM dual
  UNION ALL --
  -- other contract
  SELECT DATE '2016-02-01', DATE '2016-03-01' - 1, 3
    FROM dual
  UNION ALL --
  SELECT DATE '2016-03-01', DATE '2016-03-31', 3
    FROM dual
  --
  ),
q1 AS
 (
  -- walk the chain backwards and get the "root" start
  SELECT d.*, connect_by_root effective_start_date contract_start, LEVEL
    FROM d

  CONNECT BY PRIOR contract_id = contract_id
         AND PRIOR effective_end_date + 1 = effective_start_date),
q2 AS
 (
  -- walk the chain forward and get the "root" end
  SELECT d.*, connect_by_root effective_end_date contract_end, LEVEL
    FROM d -
  CONNECT BY PRIOR contract_id = contract_id
         AND PRIOR effective_start_date = effective_end_date + 1)
-- join the forward and backward data to get the contiguous contract start and ed
SELECT DISTINCT MIN(a.contract_start) contract_start, MAX(b.contract_end) contract_end, a.contract_id
  FROM q1 a
  JOIN q2 b
    ON a.contract_id = b.contract_id
   AND a.effective_start_date = b.effective_start_date
 GROUP BY a.effective_start_date, a.effective_end_date, a.contract_id

and it gives the desired result

+-----+----------------+--------------+-------------+
|     | CONTRACT_START | CONTRACT_END | CONTRACT_ID |
+-----+----------------+--------------+-------------+
|   1 | 2016-01-01     | 2016-04-30   |           1 |
|   2 | 2016-06-01     | 2016-06-30   |           1 |
|   3 | 2016-07-02     | 2016-07-31   |           1 |
|   4 | 2016-02-01     | 2016-03-31   |           3 |
+-----+----------------+--------------+-------------+
user103716
  • 190
  • 12