0

In Scala, how to efficiently compare the contents of two lists/seqs, regardless of their order, without sorting (I don't know what the type of elements is)?

The lists/seqs may contain duplicates.

I have seen a somewhat similar discussion, but some answers there are incorrect, or they require sorting.

Community
  • 1
  • 1
rapt
  • 11,810
  • 35
  • 103
  • 145
  • Whats the output you want given this two lists? `val a = List(2, 3, 1, 2)` and `val b = List(3, 1, 2, 3)` ? And what about `val x = List(2, 3, 1, 2)` and `val y = List(3, 1, 2, 2)` ? – Onilton Maciel Feb 15 '16 at 21:28
  • @Onilton Maciel `a` and `b` are not equivalent. `a` has two `2` and `b` don't. `x` and `y` are equivalent. – rapt Feb 15 '16 at 21:32
  • And what about the second? val x = List(2, 3, 1, 2) and val y = List(3, 1, 2, 2) – Onilton Maciel Feb 15 '16 at 21:32
  • 1
    Convert both into multisets (bag), and compare those. This is much the same as Onilton Maciel's answer, without those extra arrays of duplicated values. Multisets are (annoyingly) not in the standard library – The Archetypal Paul Feb 15 '16 at 22:53
  • @The Archetypal Paul More details on where to find an implementation of multiset? – rapt Feb 16 '16 at 04:58
  • There's one here: https://github.com/nicolasstucki/multisets But for your purpose, I don't think it's much better than the `groupBy` solution (because I think it still keeps a list of the elements with the same key value) – The Archetypal Paul Feb 16 '16 at 06:52
  • @The Archetypal Paul Thanks for the tip...! it's good to know. – rapt Feb 17 '16 at 16:53

1 Answers1

2

You can do

list1.groupBy(identity) == list2.groupBy(identity)

It's O(n).

If creating the temporary lists is an issue for you could create a helper method to get only the count for each item and not all occurrences:

def counter[T](l: List[T]) = 
  l.foldLeft(Map[T,Int]() withDefaultValue 0){ (m,x) => 
    m + (x -> (1 + m(x)))
  }

counter(list1) == counter(list2)
Onilton Maciel
  • 3,559
  • 1
  • 25
  • 29
  • alternatively: `.sortBy(_.hashCode)` – 0__ Feb 15 '16 at 21:40
  • Yes, it's mentioned in the link I provided. For each list of n different elements, it would create n lists. How efficient is this. – rapt Feb 15 '16 at 21:51
  • @rapt You should add that you don't want this space complexity to your question. As it is, I think my solution solves the problem described in your question. If you don't want the space complexity, go with the sort solution. There's no free lunch, you know. – Onilton Maciel Feb 15 '16 at 21:58