3

If there is a random array say arr = [3, 5, 1, 4, 3] and N = 5 which indicates the size of the given array, is there a way to find the first repetitive value in the array (here the answer is 3) within O(N) time complexity but without using any data structure as a dictionary, map, tree, etc.. But you can use a variable.

The idea is to have an optimal space complexity.

I was asked this question in an interview.

Generally, this is solved by using a dictionary and keep the traversed item of the array as a key and value as a count. When we reach the count of more than 2 then we have a solution. But if we are not going to use the data structure, then we have to have loop within a loop to look up to the next items.

I also tried to think of a solution by using just one variable, but a variable will not be enough.

I think it is quite impossible to get the solution in O(N). However, I could be wrong. Please help me find a solution to this.

EDITED

My apologies for not mentioning this before. The numbers with in the array will able be from 1 <= N, i.e., 1 <= arr[i] <= N

  • 3
    Are you allowed to modify the array itself? – Sergey Kalinichenko Jul 10 '21 at 12:07
  • Assuming the array is all integers, test/set a bit in your one allowed variable for each number seen (e.g. bit 3 for the first entry, then bit 5, etc.); if the bit is already set then that’s a repeat. O(n). Or you can avoid using the extra variable by keeping the bitmap in the first entry of the array. – DisappointedByUnaccountableMod Jul 10 '21 at 12:17
  • 3
    "I was asked this question in an interview." Are you certain that you've remembered the interview question correctly? The canonical version of this interview question requires that the numbers be drawn from a set of 1..N, where N is the size of the array. – Sergey Kalinichenko Jul 10 '21 at 12:24
  • @SergeyKalinichenko Yes, you are allowed. but you cant have a copy of the original array. I actually tried to find a solution by doing that. I ended up changing the order of the original array. If you could do it please let me know. – T.R. Bhavani shankar Jul 10 '21 at 14:02
  • @DanielHao Any kind of data structure is not allowed including Set. – T.R. Bhavani shankar Jul 10 '21 at 14:04
  • @barny That's interesting. So is this going to be like building the variable as we traverse and try to piece it in the iteration, for example, let's say in the last iteration we have variable to be "3584" and then we traverse the variable to check if duplicate? If so this won't work. I give this solution to the Interviewer and they said this is again using the data structure as the programming language dose use an array to piece the variable. – T.R. Bhavani shankar Jul 10 '21 at 14:13
  • @SergeyKalinichenko Yes, I correctly remember the question. And the numbers in the array will always be from 1 to N. Thanks for pointing this out. I will edit the question. But I'm certain that the question is correct. – T.R. Bhavani shankar Jul 10 '21 at 14:17
  • @T.R.Bhavanishankar - if the basic premise are clearly confirmed, my post should work then. Check it out. – Daniel Hao Jul 10 '21 at 14:18
  • 1
    That constraint you edited in is the only thing that makes this problem solvable. You should have included it from the beginning. Also note that your example violates the constraint. – Mark Ransom Jul 10 '21 at 14:41
  • @MarkRansom Dman! sorry about that. Thanks for identifying it. Coan you please let me know the solution that you are talking about? – T.R. Bhavani shankar Jul 10 '21 at 16:17
  • I'm not quite sure of the solution actually. The version of the problem I'm familiar with guarantees that there's only one number duplicated. – Mark Ransom Jul 10 '21 at 16:23

4 Answers4

2

A simple solution would be to encode the information in the given array, if that is allowed:

for (int i = 0; i < arr.length; i++) {
    int number = arr[i] < 0 ? -arr[i] : arr[i];
    if (arr[number - 1] < 0)
        // to restore the original array do:
        // for (j = 0; j < arr.length; j++) if (arr[j] < 0) arr[j] *= -1;
        return number;
    else
        arr[number - 1] = -arr[number - 1];
}

Instead of returning the solution immediately you could modify the array again (see comment), so that it is the same as the input. If a temporary modification is not possible either, then you probably need to work with permutation cycles. See very similar question: Find a duplicate in array of integers

maraca
  • 8,468
  • 3
  • 23
  • 45
1

Please ask questions if you need more explanation of the logic. All the number are from 1 ... to N, as the PO just updates. It just use the same array/list to do the record-keeping. [Note] it's assuming there is only ONE duplicate number in the list.

A = [3,  1, 2, 5, 4, 3]   # 
#    *               *  

N  = len(A)

for i in range(N):
    x = A[i] % N

    A[x] +=  N

print('the duplicate number: ')

for i in range(N):
    if A[i] > N * 2:     #   
        print(i)         # 3
Daniel Hao
  • 4,922
  • 3
  • 10
  • 23
  • 1
    I suppose O(2*n) is technically O(n) – DisappointedByUnaccountableMod Jul 10 '21 at 16:51
  • @T.R.Bhavanishankar note that this doesn't give you the first repeat, it gives you the lowest. A subtle but important difference if there's more than one repeat. It would be easy to fix though. – Mark Ransom Jul 10 '21 at 19:08
  • The question wouldn't have explicitly asked for the *first* duplicate unless there was a possibility of more than one. And when I said it would be easy to fix I meant it - just move the test for a duplicate into the first loop and ditch the second. – Mark Ransom Jul 11 '21 at 17:00
  • It will be in-order because you're traversing the list in-order for your first loop. To stop after the first duplicate found just use `break`. – Mark Ransom Jul 11 '21 at 17:49
1

If you can alter the array, you can walk through the array repeatedly swapping the current element into its correct position until you find the duplicate.

for i = 0 to N-1
  while arr[i] != i && arr[i] != arr[arr[i]] do
    swap(arr[i], arr[arr[i]])
  end
  if arr[i] != i
    return arr[i]
  end
end

It's unclear to some people why this is O(N).

When the outer loop encounters an element which isn't at the matching index, it will swap it with the element at that index.

Each swap reduces the number of elements not at their matching index by either 1 or 2. Therefore, there can be at most N swaps since in the worst case, at most all N elements aren't at their matching index.

E.g., arr = [3, 5, 1, 4, 3] (1 indexed as in the OP's example).

swap 1: 3 with arr[3] yield [1, 5, 3, 4, 3] Here we got lucky and reduced the elements not in the matching position by 2.

swap 2: 5 with arr[5] yields [1, 3, 3, 4, 5]

swap 3: 3 with arr[3]: we terminate because we found a match.

Dave
  • 7,460
  • 3
  • 26
  • 39
  • Will this find the first duplicate or just a duplicate? Probably does not terminate if there is no duplicate or does it? – maraca Jul 10 '21 at 16:47
  • 1
    That’s going to be >O(N) and – DisappointedByUnaccountableMod Jul 10 '21 at 16:53
  • @barny There are a max of N swaps, since each swap moves an element into its final spot. – Dave Jul 10 '21 at 17:09
  • @maraca If thre are no dupes then it terminates without returning anything (at the end of the for loop). It would be easy to return something in that case. – Dave Jul 10 '21 at 17:11
  • @maraca How is 'first duplicate' defined? E.g. in [1,2,2,1] is 1 or 2 the first dupe? – Dave Jul 10 '21 at 17:12
  • @Dave So your first iteration in the for loop will swap all the elements to their correct position and also finds the duplicate in it? – T.R. Bhavani shankar Jul 10 '21 at 17:15
  • @barny So this is O(N). The inner loop executes at most N times in total. – Dave Jul 10 '21 at 17:16
  • @T.R.Bhavanishankar It stops as soon as it finds the dupe, but until then it swaps the element i into position i. – Dave Jul 10 '21 at 17:48
  • @Dave got it. So if the for loop runs more than one iteration then this solution my not yield O(N) time complexity right? It might be more than O(N) – T.R. Bhavani shankar Jul 10 '21 at 18:06
  • @T.R.Bhavanishankar If there are no duplicates and this does the max amount of work possible, it is still O(N). – Dave Jul 10 '21 at 18:31
  • The outer loop is clearly O(N). If the inner loop is O(N) as well, that makes the whole thing O(N^2). – Mark Ransom Jul 10 '21 at 19:20
  • @MarkRansom The inner loop is O(N) across all iterations of the outer loop, not per iteration of the outer loop. – Dave Jul 10 '21 at 19:26
  • That's not at all clear from a reading of the code. – Mark Ransom Jul 10 '21 at 19:28
  • @MarkRansom I added an example and more explanation. Basic idea is that each swap reduces the number of instances where arr[i] != i by at least 1, and there are at most N of these, so at most N total swaps. What may be making it tricky to think about is those may be allocated arbitrarily among the iterations of the outer loop. E.g. all N could come when i=1, or they could be smoothly distributed. – Dave Jul 10 '21 at 19:43
  • @MarkRansom It's not O(N^2) its O(2N) in worst case, if the inner loop runs for all elements N, then they outer loop runs without looping in the inner loop which is also N, so total 2N. – Surt Jul 11 '21 at 20:53
  • @Dave In my opinion for [1,2,2,1] the first dulicate is 2. You go from left to right and the first number that appears a second time is the first duplicate. Otherwise you would ask something like find the first number that has a duplicate (then it would be the 1 at the beginning). – maraca Jul 12 '21 at 09:14
0

Some people have mentioned using a bitarray to store the seen values. Note that this will only work if N is less than 32 (or however many bits your language can handle e.g. python can automatically handle > 32 bits).

Space Complexity: O(1)

def solution(a):

    bitarray = 0 << len(a)
    
    for num in a:
        bitmask = 1 << num;
        
        if ((bitarray & bitmask) == 0) :
            bitarray = bitarray + bitmask
        else:
            return num
    
    // not found
    return -1;