-1

The problem I am facing is how to find distinct time periods from multiple time periods with overlap in Teradata ANSI SQL.

For example, the attached tables contain multiple overlapping time periods, how can I combine those time periods into 3 unique time periods in Teradata SQL???

I think I can do it in python with the loop function, but not sure how to do it in SQL

ID Start Date End Date
001 2005-01-01 2006-01-01
001 2005-01-01 2007-01-01
001 2008-01-01 2008-06-01
001 2008-04-01 2008-12-01
001 2010-01-01 2010-05-01
001 2010-04-01 2010-12-01
001 2010-11-01 2012-01-01

My expected result is:

ID start_Date end_date
001 2005-01-01 2007-01-01
001 2008-01-01 2008-12-01
001 2010-01-01 2012-01-01
Yumeng Xu
  • 179
  • 1
  • 2
  • 11
  • Why do you say you are using PostgreSQL and then you change the `postegresql` tag to `oracle`? Which of the two DBMS are you really using? – Thorsten Kettner Mar 14 '22 at 22:14

4 Answers4

3

This is a gaps and islands problem. Try this:

with u as 
(select ID, start_date, end_date,
case 
when start_date <= lag(end_date) over(partition by ID order by start_date, end_date) then 0 
else 1 end as grp
from table_name),
v as
(select ID, start_date, end_date,
sum(grp) over(partition by ID order by start_date, end_date) as island
from u)
select ID, min(start_date) as start_Date, max(end_date) as end_date
from v
group by ID, island;

Fiddle

Basically you can identify "islands" by comparing start_date of current row to end_date of previous row (ordered by start_date, end_date), if it precedes it then it's the same island. Then you can do a rolling sum() to get the island numbers. Finally select min(start_date) and max(end_date) from each island to get the desired output.

Zakaria
  • 4,715
  • 2
  • 5
  • 31
  • @YumengXu but it works fine in the embedded fiddle link. Can you reproduce the issue there? – Zakaria Mar 14 '22 at 20:45
  • @YumengXu I'm not familiar with Teradata, but I'm pretty sure it's a different product than Postgresql. The query should work in Teradata nevertheless. You can check my post's edit history by the way. – Zakaria Mar 14 '22 at 23:14
  • Hi Zakaria, actually IT just re-confirmed the SQL I am using, it is ANSI SQL.. – Yumeng Xu Mar 14 '22 at 23:54
  • @YumengXu This answer does not always work as sometimes the current row does not overlap the previous row but it overlaps the row one-or-more before that. In this [db<>fiddle](https://dbfiddle.uk/?rdbms=oracle_21&fiddle=5aae99a8cf7daa1e3c40fcf346e13e3f) there should only be one row where `id` is `002` but this answer outputs 3 rows. – MT0 Mar 15 '22 at 00:34
  • @MT0, yeah, you are right! – Yumeng Xu Mar 15 '22 at 00:37
3

From Oracle 12, you can use MATCH_RECOGNIZE to perform a row-by-row comparison:

SELECT *
FROM   table_name
MATCH_RECOGNIZE(
  PARTITION BY id
  ORDER BY start_date
  MEASURES
    FIRST(start_date) AS start_date,
    MAX(end_date) AS end_date
  ONE ROW PER MATCH
  PATTERN (overlapping_ranges* last_range)
  DEFINE overlapping_ranges AS NEXT(start_date) <= MAX(end_date)
)

Which, for the sample data:

CREATE TABLE table_name (ID, Start_Date, End_Date) AS
SELECT '001', DATE '2005-01-01', DATE '2006-01-01' FROM DUAL UNION ALL
SELECT '001', DATE '2005-01-01', DATE '2007-01-01' FROM DUAL UNION ALL
SELECT '001', DATE '2008-01-01', DATE '2008-06-01' FROM DUAL UNION ALL
SELECT '001', DATE '2008-04-01', DATE '2008-12-01' FROM DUAL UNION ALL
SELECT '001', DATE '2010-01-01', DATE '2010-05-01' FROM DUAL UNION ALL
SELECT '001', DATE '2010-04-01', DATE '2010-12-01' FROM DUAL UNION ALL
SELECT '001', DATE '2010-11-01', DATE '2012-01-01' FROM DUAL;

Outputs:

ID START_DATE END_DATE
001 2005-01-01 00:00:00 2007-01-01 00:00:00
001 2008-01-01 00:00:00 2008-12-01 00:00:00
001 2010-01-01 00:00:00 2012-01-01 00:00:00

db<>fiddle here


Update: Alternative query

SELECT id,
       start_date,
       end_date
FROM   (
  SELECT id,
         dt,
         SUM(cnt) OVER (PARTITION BY id ORDER BY dt) AS grp,
         cnt
  FROM   (
    SELECT ID,
           dt,
           SUM(type) OVER (PARTITION BY id ORDER BY dt, ROWNUM) * type AS cnt
    FROM   table_name
    UNPIVOT (dt FOR type IN (start_date AS 1, end_date AS -1))
  )
  WHERE  cnt IN (1,0)
)
PIVOT (MAX(dt) FOR cnt IN (1 AS start_date, 0 AS end_date))

Or, an equivalent that does not use UNPIVOT, PIVOT or ROWNUM and works in both Oracle and PostgreSQL:

SELECT id,
       MAX(CASE cnt WHEN 1 THEN dt END) AS start_date,
       MAX(CASE cnt WHEN 0 THEN dt END) AS end_date
FROM   (
  SELECT id,
         dt,
         SUM(cnt) OVER (PARTITION BY id ORDER BY dt) AS grp,
         cnt
  FROM   (
    SELECT ID,
           dt,
           SUM(type) OVER (PARTITION BY id ORDER BY dt, rn) * type AS cnt
    FROM   (
      SELECT r.*,
             ROW_NUMBER() OVER (PARTITION BY id ORDER BY dt ASC, type DESC) AS rn
      FROM   (
        SELECT id, 1 AS type, start_date AS dt FROM table_name
        UNION ALL
        SELECT id, -1 AS type, end_date AS dt FROM table_name
      ) r
    ) p
  ) s
  WHERE  cnt IN (1,0)
) t
GROUP BY id, grp

Update 2: Another Alternative

SELECT id,
       MIN(start_date) AS start_date,
       MAX(end_Date) AS end_date
FROM   (
  SELECT t.*,
         SUM(CASE WHEN start_date <= prev_max THEN 0 ELSE 1 END)
           OVER (PARTITION BY id ORDER BY start_date) AS grp
  FROM   (
    SELECT t.*,
           MAX(end_date) OVER (
             PARTITION BY id ORDER BY start_date
             ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
           ) AS prev_max
    FROM   table_name t
  ) t
) t
GROUP BY id, grp

db<>fiddle Oracle PostgreSQL

MT0
  • 143,790
  • 11
  • 59
  • 117
  • Hi MT0, I made a big mistake, since I am using Teradata studio, and my friend told me that it is under Oracle SQL, but in fact, it is based on Postgre SQL. Could you please introduce a solution in Postgre SQL? – Yumeng Xu Mar 14 '22 at 22:23
  • Your Oracle solution is elegant !! – Yumeng Xu Mar 14 '22 at 22:27
  • could you introduce an ANSI SQL solution? I was told that ANSI SQL is supported by Oracle, but it doest have all functions in Oracle SQL – Yumeng Xu Mar 14 '22 at 23:58
  • @YumengXu It is not a reserved word and you can replace it with something else (you just need to replace every instance). [PostgreSQL Fiddle](https://dbfiddle.uk/?rdbms=postgres_14&fiddle=85658b94a2fa51f810c4174b0021f2e4) [Oracle Fiddle](https://dbfiddle.uk/?rdbms=oracle_21&fiddle=37f8a81b7cf95d09b9df22e6287d81e5) – MT0 Mar 15 '22 at 00:58
  • Hi MT0, Thank you so much. Both of your solutions are elegant. – Yumeng Xu Mar 15 '22 at 01:30
-1

This may work ,with little bit of change in function , I tried it in Dbeaver :

select ID,Start_Date,End_Date
from 
(
select t.*,
dense_rank () over(partition by extract (year from Start_Date) order BY End_Date desc) drnk
from testing_123 t
) temp
where temp.drnk = 1
ORDER BY Start_Date;

DB08
  • 151
  • 6
  • This only works because the 3 periods seem to be in a different year? see: [DBFIDDLE](https://dbfiddle.uk/?rdbms=postgres_14&fiddle=205dbe98a80edd4c93e5bb23a55d0ba2), where I would expect 4 periods as answer) – Luuk Mar 14 '22 at 17:23
  • Hi DB08, After I ran the code, I can find overlap time periods under the same ID. – Yumeng Xu Mar 14 '22 at 17:26
-1

Try this

WITH a as (
  SELECT
    ID,
    LEFT(Start_Date, 4) as Year,
    MIN(Start_Date) as New_Start_Date
  FROM
    TAB1
  GROUP BY
    ID,
    LEFT(Start_Date, 4)
), b as (
  SELECT 
    a.ID,
    Year,
    New_Start_Date,
    End_Date
  FROM
    a
  LEFT JOIN
    TAB1
   ON LEFT(a.New_Start_Date, 4) = LEFT(TAB1.Start_Date, 4)
)
select 
  ID,
  New_Start_Date as Start_Date,
  MAX(End_Date)
from 
  b
GROUP BY
  ID,
  New_Start_Date;

Example: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=97f91b68c635aebfb752538cdd752ace

  • Hi elfrederino, After I ran your code. some extremely large end dates were created randomly. Since I cant use the LEFT(TIME,4) function in Oracle SQL, when I use SUBSTR(Time,1,2) or EXTRACT(YEAR from TIME) to replace the left function error occurs. – Yumeng Xu Mar 14 '22 at 18:56
  • Please don't post code-only answers. Why have us guess how your query works? The, `Start_Date` and `End_Date` are supposed to be dates, so why do you invoke string functions on them? Thus the DBMS must convert the dates to strings first and as this is implicit, it depends on session settings what the string will look like. It can be 'Oct 4, 2020' for instance in which case the `LEFT` function returns 'Oct ' as year. – Thorsten Kettner Mar 14 '22 at 22:09