3

Recently I had an interview question to write an algorithm which analyse an array and returns numbers which are duplicates;

My brute force solution was:

    public static ArrayList getDuplicates  (int[] input){
        ArrayList duplicates =  new ArrayList();
        int marker = 0;
        for (int i = marker + 1; (i < input.length) && (marker < input.length - 1); i++){
            if (input[marker] == input[i]){
                duplicates.add(input[marker]);
                marker++;
                continue;
            } else {
                if (i == input.length - 1){
                    marker++;
                    i = marker;
                }
                continue;
            }
        }
        return duplicates;
    } 

Without thorough analysis I gave a reply that the Big O is n*(log (n)).

After interview I checked again and found that it was not a correct answer.

The confusing part is that algorithm is repeated, however not for n times but every cycle for n-k times where k = {1..n-1}. This is a part which reset a moving index:

                if (i == input.length - 1){
                marker++;
                i = marker;
            }

What is a best approach to analyse this algorithm to find a correct Big O function?

Servy
  • 202,030
  • 26
  • 332
  • 449
zakb
  • 91
  • 1
  • 5
  • Same answer every time: it depends. On who is doing the analysis, for one thing. Does this procedure always terminate? – greybeard Nov 22 '16 at 18:28
  • 2
    Time complexity appears to be n^2 in the worst case since you are comparing every element against every other element if you find no match. Essentially you mus do input.length choose 2 comparisons if none are found. This number of comparisons increases at a quadratic rate. – Luke Kot-Zaniewski Nov 22 '16 at 18:28
  • 1
    A Dictionary/hashtable for the duplicates would have made this quite a bit easier for you, and would have moved the complexity down to O(n) at the cost of memory. – Michael Dorgan Nov 22 '16 at 18:54
  • Big O notation is about generalities -- you can see this is > O(n) so all the extra work is not worth it -- better to make a O(n) solution. – Hogan Nov 22 '16 at 18:56
  • 4
    Why are you using `ArrayList`? It's been obsolete for like a decade now. If I was conducting this interview I'd have stopped paying attention to your code as soon as I saw that, knowing that you wouldn't be qualified. – Servy Nov 22 '16 at 19:01
  • @greybeard yes it always terminates as it must to satisfy two conditions: (i < input.length) && (marker < input.length - 1) while i is always increased after every loop: So in one moment certainly at least one of the conditions will fail. – zakb Nov 22 '16 at 19:45
  • @Servy Could you please explain "Why are you using ArrayList? It's been obsolete for like a decade now." Why it is obsolete? What is the alternative you would use in this case? – zakb Nov 22 '16 at 19:48
  • @MichaelDorgan I mentioned that this was a brut force solution. After being asked how I would improve the solution i replied that probably using the HashTable for O(1) search to find a duplicates. So yes I agree with you. – zakb Nov 22 '16 at 19:52
  • @zakb I would expect any applicant to any job (including for an internship, even as a first year student) to, if they didn't know the answer to that question, to be able to easily determine the answer to such a question on their own, given that it is very trivially looked up. – Servy Nov 22 '16 at 19:55
  • @Servy I guess you are refereing to an ArrayList in C# because it doesn't support the generics and is not Type Safe. However my code is in JAVA. – zakb Nov 22 '16 at 20:14
  • @zakb And is also not generic, and thus has exactly the same problems. – Servy Nov 22 '16 at 20:35
  • Why I think the brute force method in question description is not correct. Just one for loop can solve the problem by brute force method? – shawn Nov 23 '16 at 00:15

1 Answers1

1

The way I would analysis this is to plug in edge cases, then see if any patterns emerge:

  1. What happens if you have an array of all matches? In this case it is O(N) because you hit the first entry of the duplicates each time.
  2. What about no duplicates? Then, your scan through the whole array N^2/ 2 times, so O(N^2)

From this, we can see that your best case is O(N) and worse is O(N^2) and I would also say that the average case is also going to be O(N^2)

The N^2 behavior would have been much easier to spot if you had used nest for-loops, one for the original data scan, and one for the duplicate scan.

If instead, you had added each entry to a container with O(1) add ability (a hash table), then your algorithm becomes much simpler:

  1. Foreach item in input array, attempt to add value to hash table.
  2. If the value already exists in hash, plug into duplicate array.
  3. Finish foreach scan and return duplicate array.
Michael Dorgan
  • 12,453
  • 3
  • 31
  • 61