13

This was an interview question.

I was given an array of n+1 integers from the range [1,n]. The property of the array is that it has k (k>=1) duplicates, and each duplicate can appear more than twice. The task was to find an element of the array that occurs more than once in the best possible time and space complexity.

After significant struggling, I proudly came up with O(nlogn) solution that takes O(1) space. My idea was to divide range [1,n-1] into two halves and determine which of two halves contains more elements from the input array (I was using Pigeonhole principle). The algorithm continues recursively until it reaches the interval [X,X] where X occurs twice and that is a duplicate.

The interviewer was satisfied, but then he told me that there exists O(n) solution with constant space. He generously offered few hints (something related to permutations?), but I had no idea how to come up with such solution. Assuming that he wasn't lying, can anyone offer guidelines? I have searched SO and found few (easier) variations of this problem, but not this specific one. Thank you.

EDIT: In order to make things even more complicated, interviewer mentioned that the input array should not be modified.

Rose M
  • 133
  • 6
  • Can't you just put all integers in a map with the number as key and occurence as value and then go through all the keys, this would be O(n) I think but also O(n) space. – maraca Feb 17 '18 at 12:48
  • @maraca That would be `O(n)` space at least. – Aurel Bílý Feb 17 '18 at 12:49
  • Ah I see, you can just do a sort by inserting the element at its correct position if there were no duplicates and if there is already an element with the value it should have you found a duplicate. – maraca Feb 17 '18 at 12:54
  • How do you sort in O(n) time with O(1) space? – giusti Feb 17 '18 at 12:55
  • Not even a little reversible modification? (like making an element negative)? – rici Feb 17 '18 at 19:38
  • @rici making an element negative sounds like it technically wouldn't be `O(1)` space anymore. – גלעד ברקן Feb 18 '18 at 01:04
  • @גלעד ברקן technically, not. But interview questions are not always technically precise. – rici Feb 18 '18 at 04:16
  • I'd create a bool[n]. For each value, set bool[value] true. If a bool[value] is true before setting, than it has a duplicate. I'd also use a BitArray if its c# instead of a bool[]. – Koray Feb 18 '18 at 10:02
  • @Rose M: Finally got it, you were right about the permutation cycles. – maraca Feb 20 '18 at 09:52

4 Answers4

14
  1. Take the very last element (x).

  2. Save the element at position x (y).

  3. If x == y you found a duplicate.

  4. Overwrite position x with x.

  5. Assign x = y and continue with step 2.

You are basically sorting the array, it is possible because you know where the element has to be inserted. O(1) extra space and O(n) time complexity. You just have to be careful with the indices, for simplicity I assumed first index is 1 here (not 0) so we don't have to do +1 or -1.

Edit: without modifying the input array

This algorithm is based on the idea that we have to find the entry point of the permutation cycle, then we also found a duplicate (again 1-based array for simplicity):

Example:

2 3 4 1 5 4 6 7 8

Entry: 8 7 6

Permutation cycle: 4 1 2 3

As we can see the duplicate (4) is the first number of the cycle.

  1. Finding the permutation cycle

    1. x = last element
    2. x = element at position x
    3. repeat step 2. n times (in total), this guarantees that we entered the cycle
  2. Measuring the cycle length

    1. a = last x from above, b = last x from above, counter c = 0
    2. a = element at position a, b = elment at position b, b = element at position b, c++ (so we make 2 steps forward with b and 1 step forward in the cycle with a)
    3. if a == b the cycle length is c, otherwise continue with step 2.
  3. Finding the entry point to the cycle

    1. x = last element
    2. x = element at position x
    3. repeat step 2. c times (in total)
    4. y = last element
    5. if x == y then x is a solution (x made one full cycle and y is just about to enter the cycle)
    6. x = element at position x, y = element at position y
    7. repeat steps 5. and 6. until a solution was found.

The 3 major steps are all O(n) and sequential therefore the overall complexity is also O(n) and the space complexity is O(1).

Example from above:

  1. x takes the following values: 8 7 6 4 1 2 3 4 1 2

  2. a takes the following values: 2 3 4 1 2
    b takes the following values: 2 4 2 4 2
    therefore c = 4 (yes there are 5 numbers but c is only increased when making steps, not initially)

  3. x takes the following values: 8 7 6 4 | 1 2 3 4
    y takes the following values: | 8 7 6 4
    x == y == 4 in the end and this is a solution!

Example 2 as requested in the comments: 3 1 4 6 1 2 5

  1. Entering cycle: 5 1 3 4 6 2 1 3

  2. Measuring cycle length:
    a: 3 4 6 2 1 3
    b: 3 6 1 4 2 3
    c = 5

  3. Finding the entry point:
    x: 5 1 3 4 6 | 2 1
    y: | 5 1
    x == y == 1 is a solution

maraca
  • 8,468
  • 3
  • 23
  • 45
  • Wow, that was fast! Thank you, my first upvote here :) Interviewer mentioned (I forgot to add) that input array should not be modified. Can you come up with a solution in such case? – Rose M Feb 17 '18 at 13:19
  • @RoseM only if there is exactly one duplicate. – maraca Feb 17 '18 at 13:28
  • That's the complicated part - there can be more than one :/ – Rose M Feb 17 '18 at 13:33
  • Thanks for updated solution, trying to grasp it. So, if we process elements 1,2,4,7 their xor will be 0 but there was no duplicate in this sequence? Did I misunderstand something? – Rose M Feb 17 '18 at 14:01
  • @RoseM 1,2,4,7 is not n+1 integers from the interval [1, n] – גלעד ברקן Feb 17 '18 at 14:45
  • @גלעדברקן This is not the whole array, just the first 4 processed elements. That was the counter-example for the algorithm that maraca provided. – Rose M Feb 17 '18 at 14:56
  • Looks very promising. I'm struggling with unambiguously interpreting 1) do 2) `repeat step 2. n times`: is that *for a total of `n` + 1* or *for a total of `n`*? It may be easier to denote *1) `n` time do 2)  *. – greybeard Feb 20 '18 at 10:31
  • @greybeard it's the total, n times not n+1, I was thinking about that too. Corrected it now and check for equality had to be reversed, should be correct now. – maraca Feb 20 '18 at 10:34
  • This looks promising, I will analyze it later and let you know! – Rose M Feb 21 '18 at 12:48
  • Could you please show the steps of how your algorithm would be applied to [3,1,4,6,1,2,5] ? – גלעד ברקן Feb 21 '18 at 16:38
  • @גלעדברקן yes, I edited my answer and added your example. – maraca Feb 21 '18 at 16:59
  • Great answer. Looking at it, this is completely insane to expect from someone in <=1 hour, even with hints. Thanks! – Rose M Feb 21 '18 at 21:31
  • Thanks. Great idea to measure cycle length! – גלעד ברקן Feb 22 '18 at 04:35
  • @RoseM yes, many interview questions are pretty hard to solve unless you solved a similar problem before. Also I think usually you want to know all duplicates, not just finding one. I even know a company that will give you unsolvable problems to see how you cope with stress. – maraca Feb 23 '18 at 10:15
5

Here is a possible implementation:

function checkDuplicate(arr) {
  console.log(arr.join(", "));
  let  len = arr.length
      ,pos = 0
      ,done = 0
      ,cur = arr[0]
      ;
  while (done < len) {
    if (pos === cur) {
      cur = arr[++pos];
    } else {
      pos = cur;
      if (arr[pos] === cur) {
        console.log(`> duplicate is ${cur}`);
        return cur;
      }
      cur = arr[pos];
    }
    done++;
  }
  console.log("> no duplicate");
  return -1;
}

for (t of [
     [0, 1, 2, 3]
    ,[0, 1, 2, 1]
    ,[1, 0, 2, 3]
    ,[1, 1, 0, 2, 4]
  ]) checkDuplicate(t);

It is basically the solution proposed by @maraca (typed too slowly!) It has constant space requirements (for the local variables), but apart from that only uses the original array for its storage. It should be O(n) in the worst case, because as soon as a duplicate is found, the process terminates.

Aurel Bílý
  • 7,068
  • 1
  • 21
  • 34
2

If you are allowed to non-destructively modify the input vector, then it is pretty easy. Suppose we can "flag" an element in the input by negating it (which is obviously reversible). In that case, we can proceed as follows:

Note: The following assume that the vector is indexed starting at 1. Since it is probably indexed starting at 0 (in most languages), you can implement "Flag item at index i" with "Negate the item at index i-1".

  1. Set i to 0 and do the following loop:
    1. Increment i until item i is unflagged.
    2. Set j to i and do the following loop:
      1. Set j to vector[j].
      2. if the item at j is flagged, j is a duplicate. Terminate both loops.
      3. Flag the item at j.
      4. If j != i, continue the inner loop.
  2. Traverse the vector setting each element to its absolute value (i.e. unflag everything to restore the vector).
rici
  • 234,347
  • 28
  • 237
  • 341
  • Thanks for the answer, +1 from me. Interviewer said "read-only input array", so I don't think that even this was allowed. Nice algorithm, though. – Rose M Feb 18 '18 at 07:25
-1
  • It depends what tools are you(your app) can use. Currently a lot of frameworks/libraries exists. For exmaple in case of C++ standart you can use std::map<> ,as maraca mentioned.

  • Or if you have time you can made your own implementation of binary tree, but you need to keep in mind that insert of elements differs in comarison with usual array. In this case you can optimise search of duplicates as it possible in your particular case.

binary tree expl. ref: https://www.wikiwand.com/en/Binary_tree

Maksym
  • 41
  • 7
  • 1
    I could have used a map, but that wouldn't be `O(1)` space. Interviewer specifically request constant space :/ – Rose M Feb 17 '18 at 13:07