0

I have a table that contains the following intervals of big integers:

start_point end_point
4999999995 4999999995
5000000000 5000000009
5000000000 5099999999
5000000010 5000000010
5000000011 5000000025
5000000026 5000000045
5090090903 5090090903
6000000000 6000000000

What I need is a TSQL query that merges these intervals, so they would be disjoint without any overlaps. In this case the result I am looking for is

start_point end_point
4999999995 4999999995
5000000000 5099999999
6000000000 6000000000

3 Answers3

4

Usually it is done with lag() or lead() functions, but here I was not able to make them work. So I used subquery in exists clause and it gives correct output:

with 
numbered as (
  select start_point, end_point, row_number() over (order by start_point, end_point) rn 
  from my_table),
grouped1 as (
  select start_point, end_point, rn,
         case when exists (
             select 1 from numbered b 
             where b.rn < a.rn and b.end_point + 1 >= a.start_point) 
         then 0 else 1 end as grp
  from numbered a ),
grouped2 as (
  select start_point, end_point, sum(grp) over (order by start_point, end_point) grp 
  from grouped1)
select min(start_point) sp, max(end_point) ep from grouped2 group by grp

dbfiddle demo

Another possible solution is recursive query, which also returned correct output, but recursion is slow, so I am not posting it here. You can see it in attached dbfiddle.

Ponder Stibbons
  • 14,723
  • 2
  • 21
  • 24
2

You can use a recursive approach to merge the intervals into a given table and make them disjoint, that is, there will be no overlap between the merged intervals :

 WITH CTE AS (
      SELECT start_point, end_point,
        ROW_NUMBER() OVER (ORDER BY start_point) AS rn
      FROM YourTableName
    ), RecursiveCTE AS (
      SELECT start_point, end_point, rn, end_point AS new_end_point
      FROM CTE
      WHERE rn = 1
    
      UNION ALL
    
      SELECT c.start_point, c.end_point, c.rn,
        CASE
          WHEN c.start_point <= r.new_end_point THEN r.new_end_point
          ELSE c.end_point
        END
      FROM CTE AS c
      JOIN RecursiveCTE AS r ON r.rn = c.rn - 1
      WHERE c.start_point <= r.new_end_point
    )
    SELECT start_point, MAX(new_end_point) AS end_point
    FROM RecursiveCTE
    GROUP BY start_point
    ORDER BY start_point;
Amira Bedhiafi
  • 8,088
  • 6
  • 24
  • 60
0

Thank you for everyone's effort on trying to answer my question. In the end I have managed to find a solution that uses a cursor. I'm aware of the fact that cursors are slow, but it still turned out to be faster than using nested selects or recursive CTEs. Below is my solution

    DECLARE @start_point BIGINT, @end_point BIGINT;

    DECLARE @merged_intervals TABLE (start_point BIGINT, end_point BIGINT);

    DECLARE interval_cursor CURSOR FAST_FORWARD
    FOR
    SELECT start_point, end_point
    FROM myTable
    ORDER BY start_point;

    OPEN interval_cursor;
    FETCH NEXT FROM interval_cursor INTO @start_point, @end_point;

    WHILE @@FETCH_STATUS = 0
    BEGIN
        DECLARE @merged_start_point BIGINT, @merged_end_point BIGINT;
        SET @merged_start_point = @start_point;
        SET @merged_end_point = @end_point;
        FETCH NEXT FROM interval_cursor INTO @start_point, @end_point;
        WHILE @@FETCH_STATUS = 0 AND @start_point <= @merged_end_point
        BEGIN
            IF @end_point > @merged_end_point
                SET @merged_end_point = @end_point;
            FETCH NEXT FROM interval_cursor INTO @start_point, @end_point;
        END;
        INSERT INTO @merged_intervals (start_point, end_point)
        VALUES (@merged_start_point, @merged_end_point);
    END;

    CLOSE interval_cursor;
    DEALLOCATE interval_cursor;