11

I'm having an unsorted list and want to know, whether all items in it are unique.
My naive approach would be

val l = List(1,2,3,4,3)
def isUniqueList(l: List[Int]) = (new HashSet()++l).size == l.size

Basically, I'm checking whether a Set containing all elements of the list has the same size (since an item appearing twice in the original list will only appear once in the set), but I'm not sure whether this is the ideal solution for this problem.

Edit: I benchmarked the 3 most popular solutions, l==l.distinct, l.size==l.distinct.size and Alexey's HashSet-based solution. Each function was run 1000 times with a unique list of 10 items, a unique list of 10000 items and the same lists with one item appearing in the third quarter copied to the middle of the list. Before each run, each function got called 1000 times to warm up the JIT, the whole benchmark was run 5 times before the times were taken with System.currentTimeMillis. The machine was a C2D P8400 (2.26 GHz) with 3GB RAM, the java version was the OpenJDK 64bit server VM (1.6.0.20). The java args were -Xmx1536M -Xms512M

The results:

l.size==l.distinct.size (3, 5471, 2, 6492)
l==l.distinct           (3, 5601, 2, 6054)
Alexey's HashSet        (2, 1590, 3, 781)

The results with larger objects (Strings from 1KB to 5KB):

l.size==l.distinct.size MutableList(4, 5566, 7, 6506)
l==l.distinct           MutableList(4, 5926, 3, 6075)
Alexey's HashSet        MutableList(2, 2341, 3, 784)

The solution using HashSets is definitely the fastest, and as he already pointed out using .size doesn't make a major difference.

tstenner
  • 10,080
  • 10
  • 57
  • 92

3 Answers3

15

Here is the fastest purely functional solution I can think of:

def isUniqueList(l: List[T]) = isUniqueList1(l, new HashSet[T])

@tailrec
def isUniqueList1(l: List[T], s: Set[T]) = l match {
  case Nil => true
  case (h :: t) => if (s(h)) false else isUniqueList1(t, s + h)
}

This should be faster, but uses mutable data structure (based on the distinct implementation given by Vasil Remeniuk):

def isUniqueList(l: List[T]): Boolean = {
  val seen = mutable.HashSet[A]()
  for (x <- this) {
    if (seen(x)) {
      return false
    }
    else {
      seen += x
    }
  }
  true
}

And here is the simplest (equivalent to yours):

def isUniqueList(l: List[T]) = l.toSet.size == l.size
Alexey Romanov
  • 167,066
  • 35
  • 309
  • 487
  • 2
    you can change the `if (s.contains(h)) false else isUniqueList1(t, s + h)` to: `!(s(h) || !isUniqueList(t, s + h))` – oxbow_lakes Oct 06 '10 at 10:58
  • 4
    But then it is no longer tail recursive... I would at least apply de Morgan's rules and turn it into `!s(h) && isUniqueList(t, s + h)`, which is. – Alexey Romanov Oct 06 '10 at 11:08
  • There's nothing wrong with using an immutable data structure in the second example because the function as a whole is still referentially transparent. – Tom Crockett Oct 07 '11 at 20:21
6

I would simply use distinct method:

scala> val l = List(1,2,3,4,3)
l: List[Int] = List(1, 2, 3, 4, 3)

scala> l.distinct.size == l.size
res2: Boolean = false


ADD: Standard distinct implementation (from scala.collection.SeqLike) uses mutable HashSet, to find duplicate elements:

  def distinct: Repr = {
    val b = newBuilder
    val seen = mutable.HashSet[A]()
    for (x <- this) {
      if (!seen(x)) {
        b += x
        seen += x
      }
    }
    b.result
  }
Vasil Remeniuk
  • 20,519
  • 6
  • 71
  • 81
  • +1 for distinct, but wouldn't `l.distinct.size == l.size` be O(1) instead of O(n) for the comparison? – tstenner Oct 06 '10 at 10:35
  • 2
    No, since `List.size` is itself O(n). See http://lampsvn.epfl.ch/trac/scala/browser/scala/tags/R_2_8_0_final/src//library/scala/collection/immutable/List.scala#L1 to check that size has to be calculated by iterating over the list. – Alexey Romanov Oct 06 '10 at 10:44
  • You're right, but List.size can be implemented as O(1) whereas it's impractical for comparison. – tstenner Oct 07 '10 at 08:40
2

A more efficient method would be to attempt to find a dupe; this would return more quickly if one were found:

var dupes : Set[A] = Set.empty

def isDupe(a : A) = if (dupes(a)) true else { dupes += a; false }

//then
l exists isDupe 
oxbow_lakes
  • 133,303
  • 56
  • 317
  • 449