2

I have a table with multiple entries. One entry consists of start Datetime and end datetime.

I want to find clusters of entries in such way that:

If an entry starts before the prior entry ends then both are part of the cluster. It is some kind of overlapping problem.

Example:

id      start                    end
1       2007-04-11 15:34:02      2007-05-11 13:09:01
2       2007-06-13 15:42:39      2009-07-21 11:30:00
3       2007-11-26 14:30:02      2007-12-11 14:09:07
4       2008-02-14 08:52:11      2010-02-23 16:00:00

I want output of

id      start                    end
1       2007-04-11 15:34:02      2007-05-11 13:09:01
2-4     2007-06-13 15:42:39      2010-02-23 16:00:00

I had a solution that sorts start and then does some calculations with rownumber and lag/lead and so on. The problem is the special case where line 4 does come directly after line 2 and so I don't recognize it...

Is there a good solution in sql here? Maybe I am missing something?

ekad
  • 14,436
  • 26
  • 44
  • 46
tuxmania
  • 906
  • 2
  • 9
  • 28

3 Answers3

1

Ok here is some solution with recursive cte:

CREATE TABLE t
(
    id INT,
    s  DATE,
    e  DATE
);

INSERT INTO t
VALUES (1, '20070411', '20070511'),
       (2, '20070613', '20090721'),
       (3, '20071126', '20071211'),
       (4, '20080214', '20100223');

WITH cte AS (
    SELECT id, s, e, id AS rid, s AS rs, e AS re
    FROM t
    WHERE NOT EXISTS(
            SELECT *
            FROM t ti
            WHERE t.s > ti.s
              AND t.s < ti.e
        )
    UNION ALL
    SELECT t.*, c.rid, c.rs,
       CASE
           WHEN t.e > c.re THEN t.e
           ELSE c.re
           END
    FROM t
    JOIN cte c ON t.s > c.s AND t.s < c.e
)
SELECT min(id) minid,
       max(id) maxid,
       min(rs) startdate,
       max(re) enddate
FROM cte
GROUP BY rid

Output:

minid   maxid   startdate   enddate
1       1       2007-04-11  2007-05-11
2       4       2007-06-13  2010-02-23

Fiddle http://sqlfiddle.com/#!6/2d6d3/10

crishoj
  • 5,660
  • 4
  • 32
  • 31
Giorgi Nakeuri
  • 35,155
  • 8
  • 47
  • 75
  • Works. Not very efficient though but it works :). I hoped it could be done with windowed functions.. but it is ok. – tuxmania Apr 17 '15 at 14:13
0

Try this...

select a.id ,a.start,a.end,b.id,b.start,b.end
from   tab   a
cross join tab b
where  a.start between b.start and b.end
order by a.start, a.end

We will have to check each row against all other rows, just like using a loop and an inner loop. for this purpose we do a cross join.

Then we will check the overlapping using the BETWEEN AND Operator

0

To answer this question, you want to determine which times start a new group. Then, before each time start, count the number of such starts to define a group -- and aggregate by this value.

Assuming you have no duplicate times, this should work to set the flag:

select e.*,
       (case when not exists (select 1
                              from entries e2
                              where e2.start < e.start and e2.end > e.start
                             )
             then 1 else 0
        end) as BeginsIsland
from entries e;

The following then does the cumulative sum and aggregation, assuming SQL Server 2012+ (this can easily be adapted to earlier versions, but this is easier to code):

with e as (
      select e.*,
             (case when not exists (select 1
                                    from entries e2
                                    where e2.start < e.start and e2.end > e.start
                                   )
                       then 1 else 0
              end) as BeginIslandFlag
      from entries e
     )
select (case when min(id) = max(id) then cast(max(id) as varchar(255))
             else cast(min(id) as varchar(255)) + '-' + cast(max(id) as varchar(255))
        end) as ids,
       min(start) as start, max(end) as end
from (select e.* sum(BeginIslandFlag) over (order by start) as grp
      from e
     ) e
group by grp;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786