Solving the "firstDuplicate" question in Python

Question

I'm trying to solve the following challenge from codesignal.com:

Given an array a that contains only numbers in the range from 1 to a.length, find the first duplicate number for which the second occurrence has the minimal index. In other words, if there are more than 1 duplicated numbers, return the number for which the second occurrence has a smaller index than the second occurrence of the other number does. If there are no such elements, return -1.

Example

For a = [2, 1, 3, 5, 3, 2], the output should be firstDuplicate(a) = 3.

There are 2 duplicates: numbers 2 and 3. The second occurrence of 3 has a smaller index than the second occurrence of 2 does, so the answer is 3.

For a = [2, 4, 3, 5, 1], the output should be firstDuplicate(a) = -1.

The execution time limit is 4 seconds.

The guaranteed constraints were:

1 ≤ a.length ≤ 10^5, and

1 ≤ a[i] ≤ a.length

So my code was:

def firstDuplicate(a):
    b = a
    if len(list(set(a))) == len(a):
        return -1

    n = 0
    answer = -1
    starting_distance = float("inf")

    while n!=len(a):
        value = a[n]

        if a.count(value) > 1:

            place_of_first_number = a.index(value)

            a[place_of_first_number] = 'string'

            place_of_second_number = a.index(value)

            if place_of_second_number < starting_distance:

                starting_distance = place_of_second_number
                answer = value

            a=b
        n+=1
        if n == len(a)-1:
            return answer 
    return answer

Out of the 22 tests the site had, I passed all of them up to #21, because the test list was large and the execution time exceeded 4 seconds. What are some tips for reducing the execution time, while keeping the the code more or less the same?

Why not just do a single pass through the list? Create a set, add element to the set if it doesn't exist in the set. If it does, it's a duplicate -- return it. If you get to the end of the list, return -1. — erip, Sep 21 '18 at 01:02
I'm voting to close this question as off-topic because questions about improving working code should go on codereview.stackexchange.com. — erip, Sep 21 '18 at 01:03
@erip Nothing about this question is off-topic, even if it would be on-topic for other SE sites. See [on-topic](https://stackoverflow.com/help/on-topic). — Patrick Haugh, Sep 21 '18 at 01:08
@PatrickHaugh The link you posted literally reads "If your question is not specifically on-topic for Stack Overflow, it may be on topic for another Stack Exchange site." And yes, questions about identifying bottlenecks are more appropriate for codereview. Specifically, this code works (1 is out), it presumably can be reproduced given a timeout (2 is out), it's not a homework problem (3 is out), it's not asking for references (4 is out), it's not about general computing (5 is out), and it's not about administration (6 is out). — erip, Sep 21 '18 at 10:49
@erip: You're reading that line backwards. The line says that questions off-topic for Stack Overflow may be on-topic for other sites, not that questions on-topic for other sites are off-topic for Stack Overflow. — user2357112, Sep 21 '18 at 18:31

score 5 · Accepted Answer · answered Sep 21 '18 at 01:18

5

As @erip has pointed out in the comments, you can iterate through the list, add items to a set, and if the item is already in a set, it is a duplicate that has the lowest index, so you can simply return the item; or return -1 if you get to the end of the loop without finding a duplicate:

def firstDuplicate(a):
    seen = set()
    for i in a:
        if i in seen:
            return i
        seen.add(i)
    return -1

answered Sep 21 '18 at 01:18

blhsing

91,368
6
71
106

See: https://stackoverflow.com/questions/46513358/finding-the-first-duplicate-of-an-array – PatrickT Jul 05 '21 at 10:10

score 2 · Answer 2 · answered Sep 26 '20 at 07:10

Create a new set and find its already in the new list, if its there return the element:

def firstDuplicate(a):
    dup = set()
    for i in range(len(a)):
        if a[i] in dup:
            return a[i]
        else:
            dup.add(a[i])
    return -1

Philip Tzou · Answer 3 · 2018-09-21T19:12:01.687

This is just an idea, I didn't verify it but it should work. It seems there's no memory limit but just a time limit. Therefore using space to trade time is probably a practical way to do this. The computation complexity is O(n). This algorithm also depends on the condition that the number range is between 1 to len(a).

def first_duplicate(a):
    len_a = len(a)
    b = [len_a + 1] * len_a
    for i, n in enumerate(a):
        n0 = n - 1
        if b[n0] == len_a + 1:
            b[n0] = len_a
        elif b[n0] == len_a:
            b[n0] = i
    min_i = len_a
    min_n = -1
    for n0, i in enumerate(b):
        if i < min_i:
            min_i = i
            min_n = n0 + 1
    return min_n

Update:

This solution is not as fast as the set() solution by @blhsing. However, it may not be the same if it was implemented in C - it's kinda unfair since set() is a built-in function which was implemented in C as other core functions of CPython.

Solving the "firstDuplicate" question in Python

3 Answers3