6

I have a question for Divide and Conquering in programming algorithms. Suppose you are given a random integer list in Python which consists of:

  1. Unique contiguous pairs of integers
  2. A single integer somewhere in the list

And the conditions are exclusive, meaning while [2,2,1,1,3,3,4,5,5,6,6] is valid, these are not:

  1. [2,2,2,2,3,3,4] (violates condition 1: because there are two pairs of 2s while there can only be a maximum of 1 pair of any number)
  2. [1,4,4,5,5,6,6,1] (violates condition 1: because there is a pair of 1s but they are not contiguous).
  3. [1,4,4,5,5,6,6,3] (violates condition 2: there are 2 single numbers, 1 and 3)

Now the question is can you find the 'single' number index in an O(lgn) algorithm?

My original jab is this:

def single_num(array, arr_max_len):

  i = 0

  while (i < arr_max_len):
    if (arr_max_len - i == 1):
      return i
    elif (array[i] == array[i + 1]):
      i = i + 2
    else:
      return i # don't have to worry about odd index because it will never happen
  
  return None 

However, the algorithm seems to run at O(n/2) time, which seems like the best it could do.

Even if I use divide and conquer, I don't think it's going to get better than O(n/2) time, unless there's some method that's beyond my scope of comprehension at the moment.

Anyone has any better idea, or can I arguably say, this is already in O(log n) time?

EDIT: It seems like Manuel has the best solution, if allowed Ill have some time to implement a solution myself for understanding, and then accept Manuel’s answer.

2 Answers2

5

Solution

Just binary search the even indexes to find the first whose value differs from the next value.

from bisect import bisect

def single_num(a):
    class E:
        def __getitem__(_, i):
            return a[2*i] != a[2*i+1]
    return 2 * bisect(E(), False, 0, len(a)//2)

Explanation

Visualization of the virtual "list" E() that I'm searching on:

       0  1   2  3   4  5   6  7   8  9   10 (indices)
  a = [2, 2,  1, 1,  3, 3,  4, 5,  5, 6,  6]
E() = [False, False, False, True,  True]
       0      1      2      3      4     (indices)

In the beginning, the pairs match (so != results in False-values). Starting with the single number, the pairs don't match (so != returns True). Since False < True, that's a sorted list which bisect happily searches in.

Alternative implementation

Without bisect, if you're not yet tired of writing binary searches:

def single_num(a):
    i, j = 0, len(a) // 2
    while i < j:
        m = (i + j) // 2
        if a[2*m] == a[2*m+1]:
            i = m + 1
        else:
            j = m
    return 2*i

Sigh...

I wish bisect would support giving it a callable so I could just do return 2 * bisect(lambda i: a[2*i] != a[2*i+1], False, 0, len(a)//2). Ruby does, and it's perhaps the most frequent reason I sometimes solve coding problems with Ruby instead of Python.

Testing

Btw I tested both with all possible cases for up to 1000 pairs:

from random import random

for pairs in range(1001):
    a = [x for _ in range(pairs) for x in [random()] * 2]
    single = random()
    assert len(set(a)) == pairs and single not in a
    for i in range(0, 2*pairs+1, 2):
        a.insert(i, single)
        assert single_num(a) == i
        a.pop(i)
Manuel
  • 912
  • 4
  • 11
  • This gives incorrect answer for some cases e.g. `single_num([2, 1, 1])` it reports as 0 (should be 2). – DarrylG Mar 02 '21 at 12:51
  • @DarrylG No, index 0 is the correct answer. – Manuel Mar 02 '21 at 12:54
  • @Manuel--thought it would report the number not the index of the number as the question asks. I would suggest a comment in your code to clarify the type of result. – DarrylG Mar 02 '21 at 13:01
  • @DarrylG You can see from the OP's code that they want the index, so I'm not sure why you thought I'd do something different. Also, index is more useful, as you can easily get the value from the index in O(1) but not the other way around. – Manuel Mar 02 '21 at 13:05
  • @Manuel--no problem. I was going by OP comment--"Now the question is can you find the **'single' number** in an O(lgn) algorithm" and going by Burntice answer. Your answer is much improved with the additional explanation in the code so I'll upvote it. – DarrylG Mar 02 '21 at 13:09
  • @DarrylG Yeah, that sentence is ambiguous and it's not clarified elsewhere in the text, either, so I went with what the code says (usually I don't, often I don't even read it, given that people ask about it because it doesn't work :-). – Manuel Mar 02 '21 at 13:14
  • Sorry for the confusion, I really needed more sleep, though I it’s just the difference between returning i and array[i] in my OP. Fixed it for consistency. – RosaryLightning X Mar 02 '21 at 13:19
  • This is a really clever answer, though it draws from utilizing different computing mechanisms. I didn’t specify it as an exclusive condition in my post, so this is incredibly helpful and elegant, provided that we’re not required to come up with restricted solutions (I.e. using only what basic Python provides) – RosaryLightning X Mar 02 '21 at 13:30
  • @RosaryLightningX What do you mean with "different computing mechanisms" and "basic Python"? – Manuel Mar 02 '21 at 13:34
  • That was too quick! – RosaryLightning X Mar 02 '21 at 13:39
  • Sorry again, I hate not being able to figure this out, but for the initial solution (bisect), wouldn't looping through the even indexes of whole list put it still at O(n/2), so O(n/2) + O(lg n) is still O(n/2)? Brb after some work... – RosaryLightning X Mar 02 '21 at 15:12
  • @RosaryLightningX Not sure what you're talking about. I'm not looping. – Manuel Mar 02 '21 at 16:36
  • Isn’t checking through each even index to see if it’s equivalent to the next in O(n/2) time? Could you elaborate as to how the solutions are O(lg n)? Sorry this might come from not fully understanding bisect. – RosaryLightning X Mar 02 '21 at 23:36
  • @RosaryLightningX That's not what `bisect` does. It's a binary search. So it looks only at O(lg n) elements (and thus my `E` -object also only looks at O(lg n) elements). – Manuel Mar 02 '21 at 23:40
  • Can you elaborate on the logic of what part of the list we’re discarding with each iteration with bisect? – RosaryLightning X Mar 02 '21 at 23:42
  • 1
    @RosaryLightningX The remaining half that doesn't contain what we search for. It's just an ordinary binary search. – Manuel Mar 02 '21 at 23:42
  • @RosaryLightningX the intuition for this binary search is that before the single number occurs, all pairs are on indices (2k, 2k+1) i.e. first element on even, duplicate element on odd index. After the single element occurs, all duplicates are on indices (2k+1, 2k), the first element on odd the duplicate on even. This means that if we check an even index, we can tell if we are to the left of the single element or to its right: when the element to the right of the even index is a duplicate, we are on the left. When it is not a duplicate, we are on the right – Ciprian Tomoiagă Mar 23 '21 at 16:16
5

A lg n algorithm is one in which you split the input into smaller parts, and discard some of the smaller part such that you have a smaller input to work with. Since this is a searching problem, the likely solution for a lg n time complexity is binary search, in which you split the input in half each time.


My approach is to start off with a few simple cases, to spot any patterns that I can make use of.

In the following examples, the largest integer is the target number.

# input size: 3  
[1,1,2]
[2,1,1]

# input size: 5  
[1,1,2,2,3]
[1,1,3,2,2]
[3,1,1,2,2]

# input size: 7  
[1,1,2,2,3,3,4]
[1,1,2,2,4,3,3]
[1,1,4,2,2,3,3]
[4,1,1,2,2,3,3]

# input size: 9  
[1,1,2,2,3,3,4,4,5]
[1,1,2,2,3,3,5,4,4]
[1,1,2,2,5,3,3,4,4]
[1,1,5,2,2,3,3,4,4]
[5,1,1,2,2,3,3,4,4]

You probably notice that the input size is always an odd number i.e. 2*x + 1.

Since this is a binary search, you can check if the middle number is your target number. If the middle number is the single number (if middle_number != left_number and middle_number != right_number), then you have found it. Otherwise, you have to search the left side or the right side of the input.

Notice that in the sample test cases above, in which the middle number is not the target number, there is a pattern between the middle number and its pair.

For input size 3 (2*1 + 1), if middle_number == left_number, the target number is on the right, and vice versa.

For input size 5 (2*2 + 1), if middle_number == left_number, the target number is on the left, and vice versa.

For input size 7 (2*3 + 1), if middle_number == left_number, the target number is on the right, and vice versa.

For input size 9 (2*4 + 1), if middle_number == left_number, the target number is on the left, and vice versa.

That means the parity of x in 2*x + 1 (the array length) affects whether to search the left or right side of the input: search the right if x is odd and search the left if x is even, if middle_number == left_number (and vice versa).


Base on all these information, you can come up with a recursive solution. Note that you have to ensure that the input size is odd in each recursive call. (Edit: Ensuring that input size is odd makes the code even more messy. You probably want to come up with a solution in which parity of input size does not matter.)

def find_single_number(array: list, start_index: int, end_index: int):
    # base case: array length == 1
    if start_index == end_index:
        return start_index
    
    middle_index = (start_index + end_index) // 2
        
    # base case: found target
    if array[middle_index] != array[middle_index - 1] and array[middle_index] != array[middle_index + 1]:
        return middle_index
        
    # make use of parity of array length to search left or right side
    # end_index == array length - 1
    x = (end_index - start_index) // 2

    # ensure array length is odd
    include_middle = (middle_index % 2 == 0)
        
    if array[middle_index] == array[middle_index - 1]:  # middle == number on its left
        if x % 2 == 0:  # x is even
            # search left side
            return find_single_number(
                array,
                start_index,
                middle_index if include_middle else middle_index - 1
            )

        else:  # x is odd
            # search right side side
            return find_single_number(
                array,
                middle_index if include_middle else middle_index + 1,
                end_index,
            )

    else:  # middle == number on its right
        if x % 2 == 0:  # x is even
            # search right side side
            return find_single_number(
                array,
                middle_index if include_middle else middle_index + 1,
                end_index,
            )

        else:  # x is odd
            # search left side
            return find_single_number(
                array,
                start_index,
                middle_index if include_middle else middle_index - 1
            )


# test out the code
if __name__ == '__main__':
    array = [2,2,1,1,3,3,4,5,5,6,6]  # target: 4 (index: 6)
    print(find_single_number(array, 0, len(array) - 1))

    array = [1,1,2]  # target: 2 (index: 2)
    print(find_single_number(array, 0, len(array) - 1))

    array = [1,1,3,2,2]  # target: 3 (index: 2)
    print(find_single_number(array, 0, len(array) - 1))

    array = [1,1,4,2,2,3,3]  # target: 4 (index: 2)
    print(find_single_number(array, 0, len(array) - 1))

    array = [5,1,1,2,2,3,3,4,4]  # target: 5 (index:0)
    print(find_single_number(array, 0, len(array) - 1))

My solution is probably not the most efficient or elegant, but I hope my explanation helps you understand the approach towards tackling these kind of algorithmic problems.


Proof that it has a time complexity of O(lg n):

Let's assume that the most important operation is the comparison of the middle number against the left and right number (if array[middle_index] != array[middle_index - 1] and array[middle_index] != array[middle_index + 1]), and that it has a time cost of 1 unit. Let us refer to this comparison as the main comparison.

Let T be time cost of the algorithm.
Let n be the length of the array.

Since this solution involves recursion, there is a base case and recursive case.

For the base case (n = 1), it is just the main comparison, so:
T(1) = 1.

For the recursive case, the input is split in half (either left half or right half) each time; at the same time, there is one main comparison. So:
T(n) = T(n/2) + 1

Now, I know that the input size must always be odd, but let us assume that n = 2k for simplicity; the time complexity would still be the same.

We can rewrite T(n) = T(n/2) + 1 as:
T(2k) = T(2k-1) + 1

Also, T(1) = 1 is: T(20) = 1

When we expand T(2k) = T(2k-1) + 1, we get:

T(2k)
= T(2k-1) + 1
= [T(2k-2) + 1] + 1 = T(2k-2) + 2
= [T(2k-3) + 1] + 2 = T(2k-3) + 3
= [T(2k-4) + 1] + 3 = T(2k-4) + 4
= ...(repeat until k)
= T(2k-k) + k = T(20) + k = k + 1

Since n = 2k, that means k = log2 n.

Substituting n back in, we get: T(n) = log2 n + 1

1 is a constant so it can be dropped; same goes for the base of the log operation.

Therefore, the upperbound of the time complexity of the algorithm is:
T(n) = lg n

NJHJ
  • 142
  • 3
  • 9
  • Thank you, I still need to read this in detail after I doze a little, but my confusion is since the list is not sorted in any way, would not "binary search" still end up with worst case O(n/2) since we can't guarantee discarding the left or right list? Wish I had more time to investigate this problem in closer detail... – RosaryLightning X Mar 02 '21 at 04:16
  • This is really quite a nice write-up. – Mad Physicist Mar 02 '21 at 04:32
  • 1
    @RosaryLightningX it is not simply the *original input* that is split in half each time (doing so gives time complexity of O(n/2)). It is every input, *both the original as well as those derived from the original*, that is split in half each time (O(log_2 n)). Perhaps I should edit my answer to include the proof. – NJHJ Mar 02 '21 at 04:41
  • I think I see it now — we’re discarding whichever list is even-numbered? I like your solution! If you’d like to refine it a bit I will accept it. – RosaryLightning X Mar 02 '21 at 05:07
  • 1
    @RosaryLightningX I have added the proof. I hope it helps. – NJHJ Mar 02 '21 at 05:09
  • Great write-up, but the code has several errors. 1) Use of float division causes `TypeError: list indices must be integers or slices, not float` and 2) Name eror i.e. `NameError: name 'middle_number' is not defined`. – DarrylG Mar 02 '21 at 12:46
  • I think it looks on the right track, 1) might be solved with // (floor division) and 2) name error seems to be a typo. I’ll give it some more time if @BurntIce would like to fix it. – RosaryLightning X Mar 02 '21 at 13:22
  • Thanks for the proof! It really clarifies the answer. – RosaryLightning X Mar 02 '21 at 13:22
  • @Manuel Thanks for the heads-up. I have edited it; I hope it works now. – NJHJ Mar 03 '21 at 03:21