I have two sets of ranges. Each range is a pair of integers (start and end) representing some sub-range of a single larger range. The two sets of ranges are in a structure similar to this (of course the ...s would be replaced with actual numbers).
$a_ranges =
{
a_1 =>
{
start => ...,
end => ...,
},
a_2 =>
{
start => ...,
end => ...,
},
a_3 =>
{
start => ...,
end => ...,
},
# and so on
};
$b_ranges =
{
b_1 =>
{
start => ...,
end => ...,
},
b_2 =>
{
start => ...,
end => ...,
},
b_3 =>
{
start => ...,
end => ...,
},
# and so on
};
I need to determine which ranges from set A overlap with which ranges from set B. Given two ranges, it's easy to determine whether they overlap. I've simply been using a double loop to do this--loop through all elements in set A in the outer loop, loop through all elements of set B in the inner loop, and keep track of which ones overlap.
I'm having two problems with this approach. First is that the overlap space is extremely sparse--even if there are thousands of ranges in each set, I expect each range from set A to overlap with maybe 1 or 2 ranges from set B. My approach enumerates every single possibility, which is overkill. This leads to my second problem--the fact that it scales very poorly. The code finishes very quickly (sub-minute) when there are hundreds of ranges in each set, but takes a very long time (+/- 30 minutes) when there are thousands of ranges in each set.
Is there a better way I can index these ranges so that I'm not doing so many unnecessary checks for overlap?
Update: The output I'm looking for is two hashes (one for each set of ranges) where the keys are range IDs and the values are the IDs of the ranges from the other set that overlap with the given range in this set.