1

I have two sets of intervals: one containing “positive” intervals between, say, 0 and 100; and the other containing “negative” intervals between -100 and 0. The intervals in each set are not necessarily unique (maybe “collection” is a better word in this case than “set”), and can overlap. For example, the positive set is

{ [0, 10], [5,15], [5,15], [10,15], [10,20], [25, 40] }

and the negative set is

{ [-15, 0], [-15,-5], [-20,-15], [-30,-25] }

The adjacent non-overlapping intervals (i.e. those intervals where the right end-point of one is equal to the left end-point of the other) within each set can be combined to form longer intervals, e.g. [0,10] + [10,20] = [0,20] and [-15,0] + [-20,-15] = [-20,0], but [0,10] and [5,15] cannot be combined into [0,15].

The positive and negative intervals may be cancelled with each other if they span exactly the same range in absolute numbers, e.g. [5,15] + [-15,-5] = 0 and [0,10] + [10,20] + [-15,0] + [-20,-15] = [0,20] + [-20,0] = 0.

I am looking for an efficient algorithm for joining and cancelling the intervals in a way that minimizes the total combined length of the remaining intervals. In the example, the remaining total length = len([5,15]) + len([10,15]) + len([25,40]) + len([-30,-25]) = 10 + 5 + 15 + 5 = 35.

Maybe this type of problem has been addressed already somewhere in the literature or here (I couldn’t find anything, but maybe it’s just because I don’t know how to formulate it in a formal way), so I would be grateful for references and links; or a solution posted here would of course also suffice.

Below are my first naive thoughts on the (very) high-level steps that could be taken. The idea is that a positive interval whose left end-point matches with a left end-point of some negative interval is "potentially cancelable" either if its right end-point matches a right end-point of some negative interval, or if one of its adjacent intervals is "potentially cancelable".

Let's use positive numbers for both sets to denote intervals' left (l) and right (r) end-points, calling them l+ / l- and r+ / r- for positive / negative set. Set S = 0.

  1. Find all left end-points such that l+ = l- = l and all right end-points such that r+ = r- = r. For each such l and each r, find n_l = min{number of positive intervals with l+ = l; number of negative intervals with l- = l} and n_r = min{number of positive intervals with r+ = r; number of negative intervals with r- = r}.

  2. Find the smallest l_min from the set of matched left end-points {l} from Step 1 and find the largest r_max from the set of matched right end-points {r} from Step 1. Keep all the intervals that fall entirely between l_min and r_max for further processing in the next steps. Calculate S = S + (the total length of the intervals which do not fall entirely between the two bounds l_min and r_max).

  3. Order the intervals in each set by the left end-points in an ascending order.

  4. At each left end-point, arrange intervals by their length in a descending order.

  5. Loop over all positive intervals starting at the left-most point l_min.

  6. Compare the right end-point of the interval with the set of matched right end-points r from Step 2.

  7. If no match in Step 6 if found, then look for a next interval whose left end-point is equal to the right end-point of the current interval.

  8. If an interval in Step 7 is found, then use it go back to Step 6.

  9. If no interval in Step 7 is found, then add the length of the current interval to the sum of lengths S. Decrease n_l corresponding to the left end-point of the current interval by 1: n_l := n_l - 1. If the resulting new value of n_l = 0 then go to Step 2. If n_l > 0 then go to Step 5 and take the next interval with the same left end-point as the current interval. Remove the current interval from further steps.

  10. If a match in Step 6 if found, then use negative intervals to go to Step 5.

work in progress...

[...]

  1. For each set (positive S+ and negative S-) construct the longest possible combinations of intervals treating non-unique intervals as identical. Say there are N_C+ and N_C- different combinations possible each containing N_k+ and N_k- intervals after joining with k+ = 1..N_C+ and k- = 1..N_C-.

  2. Compare these combinations between two sets (starting with those combinations which contain the longest intervals) eliminating / canceling sections which coincide.

  3. Calculate the total remaining length.

Obviously, there are many details that have to be filled in for the above, but at this point I am not even sure if this approach guarantees finding the minimum solution.

Confounded
  • 446
  • 6
  • 19
  • Have you got an existing algorithm to be improved or worked example of minimizing the problem? The efficiency factor of your question may imply you already have one – Joseph Young Sep 28 '16 at 12:51
  • @Joseph Young No, I don't have anything at this point. – Confounded Sep 28 '16 at 12:52
  • may be you can explore segment tree for this... – Pranalee Sep 28 '16 at 12:55
  • Why do you have the intervals as sets. To me it looks like as if they are just two set's of numbers in overall (1 for positive and the other for negative). Does this kind of arrangement solves you any particular problem ? – Soundararajan Sep 28 '16 at 12:56
  • @ Soundararajan Sorry, I am not sure I understood your comment / question. – Confounded Sep 28 '16 at 12:58
  • @Nico Schertler Thank you for your comment. I am not sure if it can be described as simply as that. I think it is not just a question of any unions; I think that the way in which the unions are formed (i.e. which intervals are joint together) can affect the result. – Confounded Sep 28 '16 at 14:34
  • @Nico Schertler For example, given {[0,10],[10,15],[10,40]} and {[-40,0],[-15,0]}, I can combine [0,10]+[10,40]=[0,40] and cancel this with [-40,0], leaving len([-15,0])+len([10,15]) = 15 + 5 = 20. Or I can combine [0,10]+[10,15]=[0,15] and cancel this with [-15,0], leaving len([10,40])+len([-40,0]) = 30 +40 = 70. – Confounded Sep 28 '16 at 14:47
  • @Nico Schertler Sorry, maybe I wasn't clear enough in my OP, only non-overlapping adjacent intervals may be combined to form a new interval. – Confounded Sep 28 '16 at 14:55
  • I see. Ignore anything I said. But edit your question to reflect that requirement. – Nico Schertler Sep 28 '16 at 14:57
  • I see only a difficulty around intermediate equal segments ... Do you have a list of check datas ( to conclude quickly ) ? –  Sep 28 '16 at 15:41
  • @igael Thank you for your interest. I don't have a list ready, but maybe the example from comments in reply to Nico Schertler above can be used as a first go. – Confounded Sep 28 '16 at 15:57
  • Interesting question. I have in mind an exact solution, but before I describe it: This looks very much like a question from a programming competition (and the solution makes this even more likely). Please give some evidence (e.g. a link) that it is not from a *currently live* competition. – j_random_hacker Sep 28 '16 at 16:50
  • j_random_hacker Thank you for your interest. I am not sure what link you have in mind, but it is not, as far as I know, a question from a programming competition. It is related to my work in risk management. – Confounded Sep 28 '16 at 16:57
  • @Confounded : what will be the size of the data ? it is not the same exercise with 16000 segments including 10000 duplicates. –  Sep 28 '16 at 17:13
  • @igael Sorry, I am not sure I understand how the size of data (btw, what do you mean here? number of intervals?) affects the complexity of the solution. – Confounded Sep 28 '16 at 17:16
  • @Confounded : yes , the number of datas and the possibility of duplicates and overlaps. Duplicates are steps consuming. Imagine your sample set duplicated 1000 times and then remove 100 random segments. –  Sep 28 '16 at 17:19
  • @igael I don't really deal with the real data that will be used. I only try to prototype a solution. But, I expect there to be order 10^3 intervals with around 25% duplicates. – Confounded Sep 28 '16 at 17:29
  • OK. In that case I'd like to know the specific application, and whether or not you would be interested in paying me for a solution. (I'm serious about both parts: A real-world problem that can be modeled in this way is interesting, and I have to think about what to do after my current contract ends.) – j_random_hacker Sep 28 '16 at 17:56
  • @j_random_hacker If I wanted to hire a freelancer to develop a solution, I would have gone to one of many other sites that exist for that purpose (I will not be naming them here as it might be construed as advertising). As stated in my OP, I was looking first of all for references to published works that deal with this kind of problems and one of the comments above suggested to look into "segment trees". With regard to the problem, it is somewhat more artificial than real, as it is created by regulators of the industry, and even then it is only one of possible ways to interpret their rules. – Confounded Sep 29 '16 at 08:32
  • I see. Good luck solving the problem! – j_random_hacker Sep 29 '16 at 10:15
  • One quick preprocessing trick you can apply is to delete (i.e., record as "unmatched") any segment that contains any point not covered by a segment of the opposite sense. This can be done in O(n log n) time. Depending on your input, this might do nothing, or reduce the problem considerably. And BTW, you need to prepend a "@" to a username for people to be notified of comments. – j_random_hacker Sep 29 '16 at 10:40
  • @j_random_hacker Thank you for your input – Confounded Sep 29 '16 at 10:44
  • @j_random_hacker Confounded : I letted you discussing business yesterday ... Then the subject is reopen... I have an algo but it needs a lot of ( classical and boring ) scripting. Consider stacks of segments beginning a the same place, sorted by the end and then pair them from the biggest to the smallest. I think it is more a question for math.stackexchange because one must show that any choice he does, he doesn't miss something –  Sep 29 '16 at 15:31
  • @igael: If your algorithm is the one I'm thinking of, there are counterexamples to its optimality. But (a) maybe it isn't that algorithm, and (b) it may be useful to the OP to have a heuristic method anyway, so I encourage you to post it. Also, the CS StackExchange is probably a better choice. – j_random_hacker Sep 29 '16 at 15:45
  • @j_random_hacker, I'm already using it but I forgot it when I first commented the question ... Sometimes, analogies between applications are not so obvious and as in a hospital, I forget a lot of things ... The only trouble is that it is exponential and then efficient only with small sets ... You must insert a link for your job in your profile, I know many people who prefer to look for here than at any other source of experts to work with. I found my last job on math.overflow 4 or 5 years ago ... –  Sep 29 '16 at 16:05
  • @igael: Thanks, that's interesting about the jobs. I've previously successfully arranged a contract here on SO. Regarding the algorithm: If it's exponential-time then it's not the one I'm thinking of and thus could well be optimal (though, as you say, it has the disadvantage of being exponential-time...) – j_random_hacker Sep 29 '16 at 18:50

0 Answers0