Instead of querying a list of intervals with a start and end date to retrieve all intervals from the list that overlap with only the search start and end date, what is the best approach to:
From a list of date intervals,
Find all unique sets of intervals
Where every interval in each set overlaps with each other interval in that set
Using an integer example, take the list of integer intervals [{1,3},{2,4},{4,5},{5,7},{6,8}]
. From this list, the following are all the unique sets of intervals where every interval in each set overlaps with each other:
{ {1,3}, {2,4} }
{ {2,4}, {4,5} }
{ {4,5}, {5,7} }
{ {5,7}, {6,8} }
Here is the class for a DateInterval:
from datetime import datetime
class DateInterval(object):
def __init__(self, start_time, end_time):
self.start_time = datetime.strptime(start_time, '%Y-%m-%d %H:%M:%S')
seld.end_time = datetime.strptime(end_time, '%Y-%m-%d %H:%M:%S')
''' eq, gt, hash methods removed for clarity '''
I'll receive a list of intervals sorted by start_time ascending like so:
intervals = [DateInterval(start_time='2015-01-01 08:00:00', end_time='2015-01-01 08:30:00'),
DateInterval(start_time='2015-01-01 08:00:00', end_time='2015-01-01 10:00:00'),
DateInterval(start_time='2015-01-01 09:00:00', end_time='2015-01-01 11:00:00'),
DateInterval(start_time='2015-01-01 10:00:00', end_time='2015-01-01 12:00:00'),
DateInterval(start_time='2015-01-01 13:00:00', end_time='2015-01-01 16:00:00'),
DateInterval(start_time='2015-01-01 14:00:00', end_time='2015-01-01 17:00:00'),
DateInterval(start_time='2015-01-01 15:00:00', end_time='2015-01-01 18:00:00'),
DateInterval(start_time='2015-01-01 20:00:00', end_time='2015-01-01 22:00:00'),
DateInterval(start_time='2015-01-01 20:00:00', end_time='2015-01-01 22:00:00')
]
(In this example list, the start and end dates always land evenly on an hour. However, they could land on any second instead (or maybe milliseconds)). After searching the exhaustive list of questions on stackoverflow regarding overlapping intervals, I found the Interval Tree to be unsuitable for Date Intervals).
My lightly optimized brute force method consists of three tasks
- Identify all non-unique sets of intervals where at least one interval in each set overlaps with all the other intervals in that set
- Deduplicate the results of step 1 to find all unique sets of intervals where at least one interval in each set overlaps with all the other intervals in that set
- From the results of 1, find only those sets where each interval in one set overlaps with all other intervals in that set
1.
The following finds all non-unique sets where only one interval in each set overlaps with every other interval in that set, by naively comparing each interval in the interval list to all the other intervals. It assumes the list of intervals are sorted by date time ascending, which enables the break
optimization
def search(intervals, start_date, end_date):
results = []
for interval in intervals:
if end_date >= interval.start_time:
if start_date <= interval.end_time:
results.append(interval)
else:
break # This assumes intervals are sorted by date time ascending
search
is used like so:
brute_overlaps = []
for interval in intervals:
brute_overlaps.append(search(intervals, interval.start_time, interval.end_time))
2.
The following deduplicates the list of sets:
def uniq(l):
last = object()
for item in l:
if item == last:
continue
yield item
last = item
def sort_and_deduplicate(l):
return list(uniq(sorted(l, reverse=True)))
3.
And the following finds all sets where each interval in each set that overlaps with all other intervals in that set, by naively comparing each interval in a set to every other interval in that set:
def all_overlap(overlaps):
results = []
for overlap in overlaps:
is_overlap = True
for interval in overlap:
for other_interval in [o for o in overlap if o != interval]:
if not (interval.end_time >= other_interval.start_time and interval.start_time <= other_interval.end_time):
is_overlap = False
break # If one interval fails
else: # break out of
continue # both inner for loops
break # and try next overlap
if is_overlap: # all intervals in this overlap set overlap with each other
results.append(overlap)
return results