How to "union" overlapping range to non-overlapping range?

Question

Question: Can anyone suggest a better or more pythonic approach, to reducing overlapping range pairs to non overlapping range pairs?

Background: I have a list of tuples representing start and end pairs. I am trying to essentially complete a union of all the start ends pairs. The input start end pairs have overlapping values and the output should represent the input start end pairs without any overlap.

The code below is close but wrong as it outputs an extra range that was not in the input (I also realize it is not very good, and why its wrong). Can anyone suggest a better approach, or some built in function I overlooked?

Apologies for the basic question. Thanks for the help!

##create example data
pairA =[(0,5),(10,12)]
pairB =[(1,2),(11,15)]
pairC =[(1,4),(10,12),(15,17)]

#combine the lists to one list
#ultimately may have n number of lists and felt it would be easier to
merged = pairA + pairB +pairC
# produce union of list * unpacks the arguments of a list
listUnion= sorted(set().union(*merged))

#this is the piece of code I am looking at improving
#it creates new start end pairs based on the union
lastElement =listUnion[-1]
outList=[]

for item in listUnion:
    #create start end pair from value i and i+1
    if item != lastElement:
        outList.append((item,listUnion[listUnion.index(item)+1]))
    else:
        #last element of the list, becomes the last element of list pair
        #it can be ignored
        pass
print outList 
"""output: [(0, 1), (1, 2), (2,4), (4, 5), (5, 10), (10, 11), (11, 12), (12, 15), (15, 
17)]
correct output: would not have (5,10) as there is no overlap here in the input """

Edit: Added this visual representation of the problem

Look up the `Interval` class. The abstraction will likely free you from the work you're doing now, and lead directly to this solution and many more. — Prune, Oct 18 '18 at 18:14
`(0,5)` is a superset of `(1,2)`. Why wouldn't you just discard `(1,2)` entirely? — John Gordon, Oct 18 '18 at 18:44
@JohnGordon the "breakpoints" are required at a later point. — db_newb, Oct 18 '18 at 19:03
What would the output of `[(0,5),(1,4),(2,3)]` be? Would it be `[(0,1),(1,2),(2,3),(3,4),(4,5)]`? BTW, here and in your example, there are more tuples in the output than in the input. — Walter Tross, Oct 18 '18 at 20:28
@WalterTross I understand what you are getting it. I expect more output then input. [(0, 1), (1, 2), (2, 3), (3, 4), (4, 5)]. For a bit more back detail. pairA pairB and pairC have no self overlap. It is only together they overlap, and my interest is in where they do and do not overlap. In my mind this is a Union between ranges. — db_newb, Oct 18 '18 at 20:54

Walter Tross · Accepted Answer · 2018-10-19T09:50:21.883

Here is a solution. It's probably not very pythonic, because my experience with Python is very limited, but it works.

pairs_a = [(0, 5), (10, 12)]
pairs_b = [(1, 2), (11, 15)]
pairs_c = [(1, 4), (10, 12), (15, 17)]

merged = pairs_a + pairs_b + pairs_c
merged.sort()

set_list = []
cur_set = set()
cur_max = merged[0][1]
for pair in merged:
    p0, p1 = pair
    if cur_max < p0:
        set_list.append(cur_set)
        cur_set = set()
    cur_set.add(p0)
    cur_set.add(p1)
    if cur_max < p1:
        cur_max = p1
set_list.append(cur_set)

out_list = []
for my_set in set_list:
    my_list = sorted(my_set)
    p0 = my_list[0]
    for p1 in my_list[1:]:
        out_list.append((p0, p1))
        p0 = p1

# more pythonic but less readable in spite of indentation efforts:
# out_list = [pair
#             for zipped in [zip(list[:-1], list[1:])
#                            for list in [sorted(set)
#                                         for set in set_list]]
#                 for pair in zipped]

# alternate ending:
# out_list = [sorted(set) for set in set_list]

print(out_list)

The idea is to sort all range pairs by the first item first. This is what merged.sort() does (it uses successive tuple members to disambiguate, but this is unimportant here). Then we loop over the sorted range pairs, and as long as we are within a bunch of overlapping ranges, we add all starts and ends to the current set. In order to know when the bunch ends, we keep the max of all range ends. As soon as a range start arrives that is beyond this max, we store away the current set by appending it to a list, and begin a new one. The last set has to be added to the list after the loop. Now we have a list of sets, which we can easily translate to a list of lists or to a list of pairs.

Thanks Walter. I will tweak the code and see if I can get it right. It produces an extra overlap, (17, 10), but I appreciate the alternative approach. Code output (17, 10) which contains some of the other output pairs. — db_newb, Oct 18 '18 at 22:05
oh, it does NOT on my python 3.7.0. What python do you have? — Walter Tross, Oct 18 '18 at 22:08
I think I just fixed it, replacing `list(my_set)` with `sorted(my_set)`. Apparently whether (some?) sets appear sorted when converted to lists or, possibly and more in general, when iterated over, depends on the python — Walter Tross, Oct 18 '18 at 22:13
Wow thanks, I should have specified, this had to be in 2.7. You tweaked it before I could. If I ever figure out a pythonic way. I will send it your way. — db_newb, Oct 18 '18 at 22:27
good. I also added an "alternative ending", a "list of lists" output, which you could find useful. — Walter Tross, Oct 18 '18 at 22:28
oh, is a lonely range with start == end valid? Because if it is, I have to add a few lines of fix to the code. Currently ranges with start == end are discarded. — Walter Tross, Oct 18 '18 at 22:43

score 0 · Answer 2 · answered Oct 18 '18 at 18:11

0

Not sure of your environment constraints, but if you don't have any, you might wanna consider this: https://pypi.org/project/intervaltree/ particularly,

result_tree = tree.union(iterable)

answered Oct 18 '18 at 18:11

Shervin

409
7
13

I will look into intervaltree – db_newb Oct 18 '18 at 18:59

score -1 · Answer 3 · answered Oct 18 '18 at 18:23

-1

Could you clarify the problem, please. I see that [(0,5), (1,2)] produces [(0, 1), (1, 2), (2, 5)]. What would [(0,5), (1,5)] produce, [(0, 1), (1, 5), (5, 5)], or just [(0,1)], or something else?

answered Oct 18 '18 at 18:23

Tumbislav

119
2

Sorry for the lack of clarity, 0,5 1,5 would produce (0,1),(1,5). – db_newb Oct 18 '18 at 18:55

How to "union" overlapping range to non-overlapping range?

3 Answers3