2

A friend gave me this problem as a challenge, and I've tried to find a problem like this on LeetCode, but sadly could not.

Question

Given a line of people numbered from 1 to N, and a list of pairs of M enemies, find the total number of sublines with people that contain no two people that are enemies.

Example: N = 5, enemies = [[3,4], [3,5]]

Answer: 9

Explanation: These continuous subintervals are:

[1,1], [2,2], [3,3], [1,2], [2,3], [1,3], [4,4], [4,5], [5,5]

My approach

We define a non-conflicting interval as a contiguous interval from (and including) [a,b] where no two people are enemies in that interval.

Working backwards, if I know there is a non conflicting interval from [1,3] like in the example given above, I know the number of contiguous intervals between those two numbers is n(n+1)/2 where n is the length of the interval. In this case, the interval length is 3, and so there are 6 intervals between (and including) [1,3] that count.

Extending this logic, if I have a list of all non-conflicting intervals, then the answer is simply the sum of (n_i*(n_i+1))/2 for every interval length n_i.

Then all I need to do is find these intervals. This is where I'm stuck.


I can't really think of a similar programming problem. This seems similar, but the opposite of what the Merge Intervals problem on leetcode asks for. In that problem we're sorta given the good intervals and are asked to combine them. Here we're given the bad.

Any guidance?


EDIT: Best I could come up with:

Does this work?

So let's define max_enemy[i] as the largest enemy that is less that a particular person i, where i is the usual [1,N]. We can generate this value in O(M) time simply using a the following loop:

max_enemy = [-1] * (N+1) # -1 means it doesn't exist
for e1, e2 in enms:
    e1, e2 = min(e1,e2), max(e1, e2)
    max_enemy[e2] = max(max_enemy[e2], e1)

Then if we go through the person's array keeping a sliding window. The sliding window ends as soon as we find a person i who has: max_enemy[i] < i. This way we know that including this person will break our contiguous interval. So we now know our interval is [s, i-1] and we can do our math. We reset s=i and continue.

Here is a visualization of how this works visually. We draw a path between any two enemies:

N=5, enemies = [[3,4], [3,5]]

1   2   3   4  5
        |   |  |
        -----
        |      |
        --------

EDIT2: I know this doesn't work for N=5, enemies=[[1,4][3,5]], currently working on a fix, still stuck

JoeVictor
  • 1,806
  • 1
  • 17
  • 38

2 Answers2

2

You can solve this in O(M log M) time and O(M) space.

Let ENDINGAT(i) be the number of enemy-free intervals ending at position/person i. This is also the size of the largest enemy-free interval ending at i.

The answer you seek is the sum of all ENDINGAT(i) for every person i.

Let NEAREST(i) be the nearest enemy of person i that precedes person i. Let it be -1 if i has no preceding enemies.

Now we can write a simple formula to calculate all the ENDINGAT(values):

ENDINGAT(1) = 1, since there is only one interval ending at 1. For larger values:

ENDINGAT(i) = MIN( ENDINGAT(i-1)+1, i-NEAREST(i) )

So, it is very easy to calculate all the ENDINGAT(i) in order, as long as we can have all the NEAREST(i) in order. To get that, all you need to do is sort the enemy pairs by the highest member. Then for each i you can walk over all the pairs ending at i to find the closest one.

That's it -- it turns out to be pretty easy. The time is dominated by the O(M log M) time required to sort the enemy pairs, unless N is much bigger than M. In that case, you can skip runs of ENDINGAT for people with no preceding enemies, calculating their effect on the sum mathematically.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87
  • *Let ENDINGAT(i) be the number of enemy-free intervals ending at position/person i. This is also the size of the largest enemy-free interval ending at i.* Can you explain this a little more? I can't wrap my head around that term for some reason – JoeVictor Jul 13 '20 at 01:43
  • 1
    The intervals ending at i are [i,i], [i-1,i], [i-2,i], etc. If, say i-2 and i-3 are enemies, then the next interval, [i-3, i], is not enemy-free, nor are any larger intervals. The size of [i-2,i] is 3 it is the largest enemy-free interval ending at i, and its size is the number of enemy-free intervals ending at i. – Matt Timmermans Jul 13 '20 at 01:50
  • yes sure I get the calculation, you are 100% correct btw, I ran your logic on all the test cases I wrote vs. a brute force approach and they all pass! ....... I'm sorry for being slow on this, but why does the math work out? I don't understand how adding up the *size* of the intervals, i.e. the `ENDINGAT`'s, somehow adds up to be the total number of subintervals – JoeVictor Jul 13 '20 at 01:57
  • Oh, you accepted the other answer so I thought you were done. Anyway, as you consider intervals ending at i, in increasing order of length starting with [i,i], you stop when you get one that contains an enemy pair. There are 2 cases: 1) the enemy pair ends *before* i. In this case it will also be limiting the intervals ending at i-1. Otherwise 2) the enemy pair ends at i. The MIN considers both of these cases and picks the most limiting one. – Matt Timmermans Jul 14 '20 at 01:05
  • We get the minimum, because both cases are limits, and the tightest limit applies. I think you need to try a few examples by hand. – Matt Timmermans Jul 14 '20 at 12:28
1

There's a cool visual way to see this!

Instead of focusing the line, let's look at the matrix of pairs of players. If ii and j are enemies, then the effect of this enemiship is precisely to eliminate from consideration (1) this interval, and (2) any interval strictly larger than it. Because enemiship is symmetric, we may as well just look at the upper-right half of the matrix, and the diagonal; we'll use the characters

  • "X" to denote that a pair is enemies,
  • "*" to indicate that a pair has been obscured by a pair of enemies, and
  • "%" in the lower half to mark it as not part of the upper-half matrix.

For the two examples in your code, observe their corresponding matrices:

# intervals:  9                   # intervals:  10

 0   1   2   3   4                 0   1   2   3   4      
------------------------          ------------------------
             *   *   | 0                        *   *   | 0           
 %           *   *   | 1            %           X   *   | 1           
 %   %       X   X   | 2            %   %           X   | 2           
 %   %   %           | 3            %   %   %           | 3           
 %   %   %   %       | 4            %   %   %   %       | 4           

The naive solution, provided below, solves the problem in O(N^2 M) time and O(N^2) space.

def matrix(enemies):
    m = [[' ' for j in range(N)] for i in range(N)]
    for (i,j) in enemies:
        m[i][j] = 'X' #Mark Enemiship
        # Now mark larger intervals as dead.
        for q in range(0,i+1):
            for r in range(j,N):
                if m[q][r] == ' ':
                    m[q][r] = '*'

    num_int = 0
    for i in range(N):
        for j in range(N):
            if(j < i):
                m[i][j] = '%'
            elif m[i][j] == ' ':
                num_int+=1

    print("# intervals: ", num_int)
    return m

To convince yourself further, here are the matrices where

  1. player 2 is enemies with himself, so that there is a barrier, and there are two smaller versions of the puzzle on the intervals [0,1] and [3,4] each of which admits 3 sub-intervals)
  2. Every player is enemies with the person two to their left, so that only length-(1 or 0) intervals are allowed (of which there are 4+5=9 intervals)
# intervals:  6                   # intervals:  9

 0   1   2   3   4                 0   1   2   3   4      
---------[===========+              --------[============+
         *   *   *  || 0                    X   *   *  || 0           
 %       *   *   *  || 1            %           X   *  || 1           
 %   %   X   *   *  II 2            %   %           X  II 2           
 %   %   %           | 3            %   %   %           | 3           
 %   %   %   %       | 4            %   %   %   %       | 4           


Complexity: Mathematically the same as sorting a list, or validating that it is sorted. that is, O(M log M) in the worst case, and O(M) space to sort, and still at least O(M) time in the best case to recognize if the list is sorted.
Bonus: This is also an excellent example to illustrate power of looking at the identity a problem, rather than its solution. Such a view of the of the problem will also inform smarter solutions. We can clearly do much better than the code I gave above...

We would clearly be done, for instance, if we could count the number of un-shaded points, which is the area of the smallest convex polygon covering the enemiships, together with the two boundary points. (Finding the two additional points can be done in O(M) time.) Now, this is probably not a problem you can solve in your sleep, but fortunately the problem of finding a convex hull, is so natural that the algorithms used to do it are well known.

In particular, a Graham Scan can do it in O(M) time, so long as we happen to be given the pairs of enemies so that one of their coordinates is sorted. Better still, once we have the set of points in the convex hull, the area can be calculated by dividing it into at most M axis-aligned rectangles. Therefore, if enemy pairs are sorted, the entire problem could be solved in O(M) time. Keep in mind that M could be dramatically than N, and we don't even need to store N numbers in an array! This corresponds to the arithmetic proposed to skip lines in the other answer to this question.

If they are not sorted, other Convex Hull algorithms yield an O(M log M) running time, with O(M) space, as given by @Matt Timmermans's solution. In fact, this is the general lower bound! This can be shown with a more complicated geometric reduction: if you can solve the problem, then you can compute the sum of the heights of each number, multiplied by its distance to "the new zero", of agents satisfying j+i = N. This sum can be used to compute distances to the diagonal line, which is enough to sort a list of numbers in O(M) time---a problem which cannot be solved in under O(M log M) time for adversarial inputs.

Ah, so why is it the case that we can get an O(N + M) solution by manually performing this integration, as done explicitly in the other solution? It is because we can sort the M numbers if we know that they fall into N bins, by Bucket Sort.

Thanks for sharing the puzzle!

  • While this answer is really cool, do you think there is a better than O(N^2 * M) solution? If we're going for runtimes of that magnitude, we might as well brute force the solution. – JoeVictor Jul 13 '20 at 01:06
  • @QuantumHoneybees I've updated my answer to reflect a running time analysis. – Oliver Richardson Jul 13 '20 at 01:40
  • @Matt Timmermans in his answer below has has given an algorithm that matches the O(M log M) lower bound of the running time in the worst case. – Oliver Richardson Jul 13 '20 at 01:42
  • @Matt-Timmermans: do you have a better argument for the lower bound? – Oliver Richardson Jul 13 '20 at 02:20
  • @QuantumHoneybees do you buy the lower bound analysis? – Oliver Richardson Jul 13 '20 at 02:21
  • not really.. i have a working version of Matt Timmerman's code running without sorting, using the O(M) check I mentioned in the comments to his post. So the runtime there is O(N+M). I tried it with a bunch of random inputs and they all seem to pass. Either we've discovered a great breakthrough in the field of CS or I'm an idiot. My money is on the latter. – JoeVictor Jul 13 '20 at 02:31
  • @QuantumHoneybees I believe I've shown it to be equivalent to sorting a list of numbers. Therefore we have an O(M log M) lower bound (unless the list is sorted and we can do it in O(M)) – Oliver Richardson Jul 13 '20 at 02:33
  • (@QuantumHoneybees oops second ping; don't know if the empty initial comment above ate the notification for the actual message I added afterwards) – Oliver Richardson Jul 13 '20 at 02:36
  • @QuantumHoneybees Oh that's the coolest bit! The code you linked to there leverages the +N running time cost in the same way that [Radix Sort](https://en.wikipedia.org/wiki/Radix_sort) has a dependence on the number of bits needed to store the number. You can eliminate that +N cost, but only for a multiplicative log N --- just like in sorting! – Oliver Richardson Jul 13 '20 at 02:44
  • @QuantumHoneybees I've cleaned up my answer; if you think it's interesting enough that you want to clean up the question / answer with your own code and understanding, please feel free to edit! – Oliver Richardson Jul 13 '20 at 03:19