14

This is kind of a homework question, I've been thinking about it for quite a while, and came up with a couple of solutions but I think a better one exists.

What's the fastest way to determine if there is an element (int) in the array that appears only once? Any element can appear any number of times. {3, 1, 4, 1, 4, 3} will return false while {3, 1, 4, 1, 4, 1} would return true (3 appears once).

We are only allowed to use things we already learned (all the basics, recursion, oop, searching and sorting algos, including quicksort) so making a hash table is not an option.

So far the best practical solution I came up with is sorting it using quicksort then going through it ( O(nlogn) ), the best unpractical solution I came up with is making a big array the size of all possible int values and then using it's place similar to a hash table (but that array is WAY too big to actually implement) ( O(n) )

Is there another (practical) way to do this in O(n) time?

EDIT: just got an answer from the TA, the suggested O(n) solution that I heard about was an unpractical one (the same or similar to what I suggested) and hence they told us not to use it. I'm 99% sure now that the best practical answer (without hash tables) is O(nlogn) time.

Cœur
  • 37,241
  • 25
  • 195
  • 267
kostia
  • 6,161
  • 3
  • 19
  • 23
  • 1
    You could create a Map, where the key would be the number in the array and you would increment the value for every occurence in the array. Then find out all the keys, where value is 1. – NeplatnyUdaj May 15 '13 at 15:55
  • 2
    @NeplatnyUdaj OP: "making a hash table is not an option" – zw324 May 15 '13 at 15:55
  • can't use Map, it's not part of the learned material, thought about it. – kostia May 15 '13 at 15:55
  • kkaploon: What about Set? You could just throw the numbers in the set and keep track for which values the set.add method returns false(value already in the set) – NeplatnyUdaj May 15 '13 at 15:59
  • @NeplatnyUdaj Depends on the implementation of the Set. A Set is a container and not a data structure. – Vivin Paliath May 15 '13 at 15:59
  • @NeplatnyUdaj Can't use any custom Java datatype, only primitives and [] type arrays – kostia May 15 '13 at 16:01
  • This smells a LOT like a variation of [element distinctness problem](http://en.wikipedia.org/wiki/Element_distinctness_problem) (but reversed) which is solved in `O(nlogn)` without hashing. – amit May 15 '13 at 16:02
  • 3
    Also, assuming ints in java are 32 bits - [radix sort](https://en.wikipedia.org/wiki/Radix_sort) gives you `O(d*n)` where d=32, so `O(n)`. Though arrays are limited to <2^32 size, so logn is also smaller then 32. No practical gain here – amit May 15 '13 at 16:05
  • @nickecarlo: which number would you binary search for? – jlordo May 15 '13 at 16:19
  • @alex23 '(all the basics, recursion, oop, searching and sorting algos, including quicksort)' only allowed to use primitives and [] arrays – kostia May 15 '13 at 16:19
  • @nickecarlo theres a way to do it with quicksort (not exactly like you described) but quicksort itself is O(nlogn) – kostia May 15 '13 at 16:20
  • @nickecarlo no, I'm searching for 'a number' that apears once, I don't know which one it is. – kostia May 15 '13 at 16:21
  • 1
    @kkaploon oh okay. Sorry I misunderstood. – Nico May 15 '13 at 16:21
  • @kkaploon This: http://stackoverflow.com/questions/7338070/finding-an-element-in-an-array-where-every-element-is-repeated-odd-number-of-tim relevant to your problem? – Nico May 15 '13 at 16:25
  • @nickecarlo no because in my problem the other numbers can appear any number of times, they are not limited to odd or even times. – kostia May 15 '13 at 16:31
  • Please post answers as answers, not as comments (comments are hard to follow for people coming to the site, and they can't be voted on the same way answers can). If you're leaving a comment as clarification, please edit it into the post it's clarifying, to make it easier for users to find. – George Stocker May 15 '13 at 16:51
  • @kkaploon A set is indeed an ADT, and is therefore a valid option: http://en.wikipedia.org/wiki/Set_(abstract_data_type) – Kenogu Labz May 15 '13 at 16:52
  • @KenoguLabz Set is a data structure, the only data structure I'm allowed to use is of the [] type – kostia May 15 '13 at 16:58
  • 2
    Do you have actual memory constraints? You really only need 2 bits per possible int, so you could use the big array solution and only need about 1GB. – Aaron Dufour May 15 '13 at 21:45

3 Answers3

5

You could use a customised quicksort to find distinct values without iterating over the sorted array afterwards.

When you have chosen a pivot value and are moving through the respective part of the array, IF the value matches the pivot, discard it AND discard the pivot value after you have moved through the part of the array, this would remove duplicates BEFORE the array is eventually sorted.

ie:

Sorting [5, 1, 4, 1, 4, 1]
If you choose the pivot as 4, you'd end up with the 2 sub arrays being:
[1, 1, 1] and [5]

If your pivot is never discarded, it is distinct, if it is discarded do the same process on the sublists. If a sublist has only 1 element, it is distinct.

In this way you can pick up distinct values MUCH earlier.

Edit: Yes this is still bounded by O(nlogn) ( I think ?)

Ben Meier
  • 76
  • 4
  • 3
    +1 In worst case it is O(nlogn) but I upvoted for suggesting the use of sorting to figure out the problem. This probably so far IS the best solution because he doesn't have to iterate over it after sorting. – Nico May 15 '13 at 16:31
  • This doesn't exactly answer my question, but seeing as my specific question (an algorithm with O(n) time without hash tables) cannot be answered, this is the closest thing to it, as it improves my answer (even if keeping the same time complexity). – kostia May 19 '13 at 15:41
0

You essentially have to do a bubble-sort style compare. There's no built-in function to answer the problem, and even if you sort, you still have to iterate over every element (even just to find when groups break). You could do some more complicated approaches with multiple arrays, especially if you need to find which elements return only once.

But once you find one that appears once, you can break. This code would do it. It's O(n^2), but I'm not sure you can do faster for this problem.

boolean anySingles(int[] data])
{
 outer:
 for (int i = 0; i < data.length - 1; i++)
 {
  for (int j = 0; i < data.length; j++)
  {
   if (i != j)
   {
    if (data[i] == data[j]) continue outer;
   }
  }
  // made it to the end without finding a duplicate
  return true;
 }
 return false;
}
user1676075
  • 3,056
  • 1
  • 19
  • 26
0

Let's do an experiment:

package test;

import java.util.Arrays;
import java.util.HashSet;
import java.util.Random;
import java.util.Set;

/**
 * Created with IntelliJ IDEA.
 * User: Nicholas
 * Date: 15.05.13
 * Time: 21:16
 */
public class Searcher {

    private static boolean searchBySorting(int [] array){
        int [] newArray = new int[array.length];
        System.arraycopy(array, 0, newArray,0, array.length);

        Arrays.sort(newArray);
        for (int i = 0; i < newArray.length - 2; ++i){
            if(newArray[i] == newArray[i + 1]){
                return true;
            }
        }

        return false;
    }

    private static boolean searchByCompare(int [] array){
        int [] newArray = new int[array.length];
        System.arraycopy(array, 0, newArray,0, array.length);

        for (int i = 0; i < newArray.length - 1; ++i){
            int value = newArray[i];
            for(int j = i + 1; j < newArray.length - 1; ++j){
                if(value == newArray[j]){
                    return true;
                }
            }
        }

        return false;
    }

    private static boolean searchBySet(int [] array){
        int [] newArray = new int[array.length];
        System.arraycopy(array, 0, newArray,0, array.length);

        Set<Integer> set = new HashSet<Integer>();
        for (int i = 0; i < newArray.length; ++i){
            if(set.contains(newArray[i])){
                return true;
            }

            set.add(newArray[i]);
        }

        return false;
    }

    private static int [] generateRandomArray(){
        Random random = new Random();
        int size = random.nextInt(1000) + 100;
        int [] array = new int[size];

        for (int i = 0; i < size; ++i){
            array[i] = random.nextInt();
        }

        return array;
    }

    public static void main(String [] args){

        long sortingTime = 0;
        long compareTime = 0;
        long setTime = 0;

        for (int i = 0; i < 1000; ++i){
            int [] array = generateRandomArray();

            long begin = System.currentTimeMillis();
            for(int j = 0; j < 100; ++j){
                searchBySorting(array);
            }
            long end = System.currentTimeMillis();
            sortingTime += (end - begin);

            begin = System.currentTimeMillis();
            for(int j = 0; j < 100; ++j){
                searchByCompare(array);
            }
            end = System.currentTimeMillis();
            compareTime += (end - begin);

            begin = System.currentTimeMillis();
            for(int j = 0; j < 100; ++j){
                searchBySet(array);
            }
            end = System.currentTimeMillis();
            setTime += (end - begin);
        }

        System.out.println("Search by sorting: " + sortingTime + " ms");
        System.out.println("Search by compare: " + compareTime + " ms");
        System.out.println("Search by insert: " + setTime + " ms");
    }
}

My results:

Search by sorting: 2136 ms

Search by compare: 11955 ms

Search by insert: 4151 ms

Are there any questions?

PS. The best algorithm I know is Tortoise and hare

gluckonavt
  • 236
  • 1
  • 4
  • Yes, I have a question: What is your conclusion, is there any better solution than the one in the question which runs in O(nlogn) without violating restrictions (non use of hash tables)? – zafeiris.m May 15 '13 at 18:04
  • This answer looks like it returns true if *any* element appears at least twice, not if *every* element appears at least twice. – Dave L. Aug 02 '13 at 05:26