10

I have an assignment to create an algorithm to find duplicates in an array which includes number values. but it has not said which kind of numbers, integers or floats. I have written the following pseudocode:

 FindingDuplicateAlgorithm(A) // A is the array
      mergeSort(A);
      for  int i <- 0 to i<A.length
           if A[i] == A[i+1]
                 i++
               return  A[i]
           else
                 i++

have I created an efficient algorithm? I think there is a problem in my algorithm, it returns duplicate numbers several time. for example if array include 2 in two for two indexes i will have ...2, 2,... in the output. how can i change it to return each duplicat only one time? I think it is a good algorithm for integers, but does it work good for float numbers too?

Paul R
  • 208,748
  • 37
  • 389
  • 560
Elton.fd
  • 1,575
  • 3
  • 17
  • 24
  • 2
    Be careful of using A[i+1] -- if i = (A.length - 1), Bad Things will happen. You want the for loop to continue only when i < A.length - 1. – Seth Nov 16 '10 at 09:45

6 Answers6

12

To handle duplicates, you can do the following:

if A[i] == A[i+1]:
    result.append(A[i]) # collect found duplicates in a list
    while A[i] == A[i+1]: # skip the entire range of duplicates 
        i++               # until a new value is found
Björn Pollex
  • 75,346
  • 28
  • 201
  • 283
  • +1 But detecting duplicate floating points is not more tricky than detecting duplicate ints. Two floating point values are identical if and only if `value1 == value2`. – Andreas Brinck Nov 16 '10 at 09:51
  • 2
    @Andreas: You are right, but the words *equal* and *duplicate* mean something different for floating point numbers. – Björn Pollex Nov 16 '10 at 09:53
  • 2
    No I don't think so. A value `a` is a duplicate of another value `b` if and only if `a == b`, there's no other way to define it. – Andreas Brinck Nov 16 '10 at 09:55
  • mergeSort(Arr); int i <- 0 for i<- Arr.lenght-1 if Arr[i] == Arr[i+1] return Arr[i] while A[i] = A[i+1] i++ – Elton.fd Nov 16 '10 at 10:11
  • @Sandra: I was just posting the relevant part. – Björn Pollex Nov 16 '10 at 10:14
  • @Sandra: Good, but don't post code in comments, it is unreadable :). – Björn Pollex Nov 16 '10 at 10:16
  • @Sandra: You can accept an answer as the correct solution using the tick to the left. This will reward the people that take the time to answer. – Björn Pollex Nov 16 '10 at 10:18
  • The actual floating-point numbers are easy to compare. They're equal (and duplicates) iff a == b. However, the numbers that a and b represent may be different (and duplicate), as long as the closest floating-point representation are the same. That's more an issue between "real world" and storage than an issue for the storage itself. – Vatine Nov 16 '10 at 10:19
  • @Space_C0wb0y I would strike the first paragraph of the answer since it's not correct. What is true is that most *real numbers* cannot be accurately represented as an IEEE 754 float. – Andreas Brinck Nov 16 '10 at 10:48
  • @Andreas: I struck it. My point however was to caution Sandra that floating points behave differently than integers. Mathematically equivalent expressions can yield results that compare not equal. – Björn Pollex Nov 16 '10 at 12:04
10

Do you want to find Duplicates in Java?

You may use a HashSet.

HashSet h = new HashSet();
for(Object a:A){
   boolean b = h.add(a);
   boolean duplicate = !b;
   if(duplicate)
       // do something with a;
}

The return-Value of add() is defined as:

true if the set did not already contain the specified element.

EDIT: I know HashSet is optimized for inserts and contains operations. But I'm not sure if its fast enough for your concerns.

EDIT2: I've seen you recently added the homework-tag. I would not prefer my answer if itf homework, because it may be to "high-level" for an allgorithm-lesson

http://download.oracle.com/javase/1.4.2/docs/api/java/util/HashSet.html#add%28java.lang.Object%29

Christian Kuetbach
  • 15,850
  • 5
  • 43
  • 79
2

Your answer seems pretty good. First sorting and them simply checking neighboring values gives you O(n log(n)) complexity which is quite efficient.

Merge sort is O(n log(n)) while checking neighboring values is simply O(n).

One thing though (as mentioned in one of the comments) you are going to get a stack overflow (lol) with your pseudocode. The inner loop should be (in Java):

for (int i = 0; i < array.length - 1; i++) {
    ...
}

Then also, if you actually want to display which numbers (and or indexes) are the duplicates, you will need to store them in a separate list.

Nico Huysamen
  • 10,217
  • 9
  • 62
  • 88
1

O(n) algorithm: traverse the array and try to input each element in a hashtable/set with number as the hash key. if you cannot enter, than that's a duplicate.

Maksood
  • 1,180
  • 14
  • 19
  • 1
    This seems to be the same as http://stackoverflow.com/a/4192865 . Please only post an answer if you have something new to say. And if you do, please expand your answer. – Jeffrey Bosboom Mar 03 '15 at 02:22
  • 2 things different in my post: mention of complexity and fact that you have to 'try' to insert the value from .NET perspective. In fact, the code listed in your link will throw an exception for dups in .NET CLR since it will try to insert a key that already exist. In .NET, you have to use trygetvalue() before insertion. – Maksood Apr 08 '15 at 22:18
0
 public void printDuplicates(int[] inputArray) {
    if (inputArray == null) {
        throw new IllegalArgumentException("Input array can not be null");
    }
    int length = inputArray.length;

    if (length == 1) {
        System.out.print(inputArray[0] + " ");
        return;
    }

    for (int i = 0; i < length; i++) {

        if (inputArray[Math.abs(inputArray[i])] >= 0) {
            inputArray[Math.abs(inputArray[i])] = -inputArray[Math.abs(inputArray[i])];
        } else {
            System.out.print(Math.abs(inputArray[i]) + " ");
        }
    }
}
smaiakov
  • 470
  • 5
  • 20
  • 1
    Please explain your answer. SO exists to educate people, not just answer questions – Machavity Oct 21 '15 at 18:37
  • sure. Main idea here - is to use numbers in array as index. Step 1 - in the loop change sign for all numbers under index inputArray[i]. Step 0 - check if number is negative .If so - then there was some another number that point on current element and already changed it – smaiakov Feb 05 '16 at 10:23
  • 2
    @smaiakov, What if the array element itself is larger than the array size? We will get out of bound exception. – Kiran Aug 16 '17 at 15:39
0

Your algorithm contains a buffer overrun. i starts with 0, so I assume the indexes into array A are zero-based, i.e. the first element is A[0], the last is A[A.length-1]. Now i counts up to A.length-1, and in the loop body accesses A[i+1], which is out of the array for the last iteration. Or, simply put: If you're comparing each element with the next element, you can only do length-1 comparisons.

If you only want to report duplicates once, I'd use a bool variable firstDuplicate, that's set to false when you find a duplicate and true when the number is different from the next. Then you'd only report the first duplicate by only reporting the duplicate numbers if firstDuplicate is true.

Niki
  • 15,662
  • 5
  • 48
  • 74