4

I am looking for some python code to efficiently compute interval overlaps. I've used the interval tree of the bx-python package before, but now need to delete intervals from the tree (or better yet, modify them). It seems the bx-python tree doesn't support this.

Any pointers?

Stephen Kennedy
  • 20,585
  • 22
  • 95
  • 108
buddahfist
  • 57
  • 1
  • 2
  • I needed to do this very recently, and had settled on using a set of `(start, length)` pairs in a (red/black) btree indexed by `start` (written in C with Python bindings). Then I realised that in my case a bitmap would be sufficiently efficient, and of course a bitmap implementation is relatively trivial. Could that work for you? – Robie Basak Oct 25 '10 at 12:27
  • Thanks for the reply! In my case, things are a bit more involved as I need to attach data to every interval and as intervals get altered or merged, I need to alter/merge the according data too. Not sure it will be that easy/efficient to keep a mapping from bitarray regions to data. In a tree I'd simply store the data in the nodes. – buddahfist Oct 25 '10 at 14:11

3 Answers3

3

banyan supports deleting intervals from the tree. For example, to remove a minimal number of intervals from a list of intervals such that the intervals that are left do not overlap in O(n log n), banyan.SortedSet (augmented red-black tree) could be used:

from banyan import SortedSet, OverlappingIntervalsUpdator # pip install banyan

def maximize_nonoverlapping_count(intervals):
    # build "interval" tree sorted by the end-point O(n log n)
    tree = SortedSet(intervals, key=lambda (start, end): (end, (end - start)),
                     updator=OverlappingIntervalsUpdator)
    result = []
    while tree: # until there are intervals left to consider
        # pop the interval with the smallest end-point, keep it in the result
        result.append(tree.pop()) # O(log n)

        # remove intervals that overlap with the popped interval
        overlapping_intervals = tree.overlap(result[-1]) # O(m log n)
        tree -= overlapping_intervals # O(m log n)
    return result

Example:

print maximize_nonoverlapping_count([[3, 4], [5, 8], [0, 6], [1, 2]])
# -> [[1, 2], [3, 4], [5, 8]]

See Python - Removing overlapping lists.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
0

Maybe storing of all intersection intervals can help.

You need:

  • boundaries of union of all intervals,
  • for each intersection left boundary and list of intervals from which intersection is made.

Intersection intervals can be stored in a tree, because they are represented only with left boundary. Methods insert and delete interval look like (simplified):

Insert: find intersection intervals for left and right boundary of new interval, split these intersection intervals in 2 or 3 new intersection intervals. For each intersection intervals between add pointer to new interval.

Delete: find intersection intervals for left and right boundary, merge them to intersection intervals before. For each intersection intervals between remove pointer to deleted interval.

Ante
  • 5,350
  • 6
  • 23
  • 46
0

If you're looking for a Python library that handles intervals arithmetic, consider python-interval. Disclaimer: I'm the maintainer of that library.

This library has support to check for overlaps, and to automatically merge intervals. For example:

>>> import intervals as I
>>> I.closed(1,2) | I.closed(2,3)
[1,3]
>>> I.closed(1,2).overlaps(I.closed(3,4))
False

If you want to specifically compute the overlap:

>>> I.closed(1,3) & I.closed(2, 4)
[2,3]

It supports open/closed intervals, finite or infinite. To remove intervals for a given one, just use the difference operator:

>>> I.closed(1, 4) - I.closed(2, 3)
[1,2) | (3,4]

I can help you if you can be a little bit more specific.

Guybrush
  • 2,680
  • 1
  • 10
  • 17
  • I see your library is limited to integers, floats and maybe other default variable types. But I see an amazing use for this library for versioning. Eg. I.openclosed(1.2.1,1.3.2) would be (1.2.1,1.3.2] and would contain everything from 1.2.1.x,>1.2.2 upto 1.3.2 – XChikuX Feb 13 '19 at 18:57
  • 1
    It is not limited to integers or floats, but allows any comparable values to be stored as bounds. That's interesting you suggest to use versions as bounds, because that was exactly the reason why I created this library at first ;-) I needed a uniform way to encode dependency constraints used in different packaging ecosystems, and the easiest one was to convert all these constraints to intervals (and I didn't find a library to handle such intervals at that time). – Guybrush Feb 18 '19 at 10:08
  • That's awesome. I was wondering if you have an OpenSource version of that versioning library. I tried using my own with yours. I get AttributeError 's AttributeError: '_PInf' object has no attribute 'vlist' vlist here is an object inside my versioning class that is derived from a version string. – XChikuX Feb 19 '19 at 00:02
  • Have a look here: https://github.com/AlexandreDecan/secos-constraints/blob/master/constraints/versions.py – Guybrush Feb 19 '19 at 07:47