16

I have a List[A], how is a idiomatic way of removing duplicates given an equality function (a:A, b:A) => Boolean? I cannot generally override equalsfor A

The way I can think now is creating a wrapping class AExt with overridden equals, then

list.map(new AExt(_)).distinct

But I wonder if there's a cleaner way.

Johnny Everson
  • 8,343
  • 7
  • 39
  • 75
  • You might be interested in looking at this answer: http://stackoverflow.com/questions/12739432/how-to-implement-a-set-with-a-user-defined-equality – Eric Oct 05 '12 at 13:54
  • 1
    Possible duplicate of [Scala: Remove duplicates in list of objects](http://stackoverflow.com/questions/3912753/scala-remove-duplicates-in-list-of-objects) –  Oct 18 '16 at 21:40
  • 2
    @ptay89 although the solution is the same, I believe it is a different question. – Johnny Everson Oct 19 '16 at 12:32

6 Answers6

19

There is a simple (simpler) way to do this:

list.groupBy(_.key).mapValues(_.head)

If you want you can use the resulting map instantly by replacing _.head by a function block like:

sameElements => { val observedItem = sameElements.head
                  new A (var1 = observedItem.firstAttr,
                         var2 = "SomethingElse") }

to return a new A for each distinct element.

There is only one minor problem. The above code (list.groupBy(_.key).mapValues(_.head)) didnt explains very well the intention to remove duplicates. For that reason it would be great to have a function like distinctIn[A](attr: A => B) or distinctBy[A](eq: (A, A) -> Boolean).

luno1977
  • 199
  • 2
  • 2
  • 8
    This way the result will be a map on the unique key. To get just distinct values, one should .map(_._2.head) – ubik Sep 02 '15 at 16:59
  • 1
    This works in most cases; care should be taken in lazy evaluation cases, because `groupBy` tends to imply eager evaluation. – Dan Gravell Oct 21 '16 at 10:15
8

Using the Foo and customEquals from misingFaktor's answer:

  case class Foo(a: Int, b: Int)
  val (a, b, c, d) = (Foo(3, 4), Foo(3, 1), Foo(2, 5), Foo(2, 5))
  def customEquals(x: Foo, y: Foo) = x.a == y.a

  (Seq(a, b, c, d).foldLeft(Seq[Foo]()) {
    (unique, curr) => {
      if (!unique.exists(customEquals(curr, _)))
        curr +: unique
      else
        unique
    }
  }).reverse

If result ordering is important but the duplicate to be removed is not, then foldRight is preferable

  Seq(a, b, c, d).foldRight(Seq[Foo]()) {
    (curr, unique) => {
      if (!unique.exists(customEquals(curr, _)))
        curr +: unique
      else
        unique
    }
  }
drstevens
  • 2,903
  • 1
  • 21
  • 30
4

I must say I think I'd go via an intermediate collection which was a Set if you expected that your Lists might be quite long as testing for presence (via exists or find) on a Seq is O(n) of course:

Rather than write a custom equals; decide what property the elements are equal by. So instead of:

def myCustomEqual(a1: A, a2: A) = a1.foo == a2.foo && a1.bar == a2.bar

Make a Key. Like so:

type Key = (Foo, Bar)
def key(a: A) = (a.foo, a.bar)

Then you can add the keys to a Set to see whether you have come across them before.

var keys = Set.empty[Key]
((List.empty[A] /: as) { (l, a) => 
  val k = key(a)
  if (keys(k)) l else { keys += k; a +: l  }
}).reverse

Of course, this solution has worse space complexity and potentially worse performance (as you are creating extra objects - the keys) in the case of very short lists. If you do not like the var in the fold, you might like to look at how you could achieve this using State and Traverse from scalaz 7

oxbow_lakes
  • 133,303
  • 56
  • 317
  • 449
3

Starting Scala 2.13, we can use the new distinctBy method which returns elements of a sequence ignoring the duplicates as determined by == after applying a transforming function f:

def distinctBy[B](f: (A) => B): List[A]

For instance:

// case class A(a: Int, b: String, c: Double)
// val list = List(A(1, "hello", 3.14), A(2, "world", 3.14), A(1, "hello", 12.3))
list.distinctBy(x => (x.a, x.b)) // List(A(1, "hello", 3.14), A(2, "world", 3.14))
list.distinctBy(_.c)             // List(A(1, "hello", 3.14), A(1, "hello", 12.3))
Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
3
scala> case class Foo(a: Int, b: Int)
defined class Foo

scala> val (a, b, c, d) = (Foo(3, 4), Foo(3, 1), Foo(2, 5), Foo(2, 5))
a: Foo = Foo(3,4)
b: Foo = Foo(3,1)
c: Foo = Foo(2,5)
d: Foo = Foo(2,5)

scala> def customEquals(x: Foo, y: Foo) = x.a == y.a
customEquals: (x: Foo, y: Foo)Boolean

scala> Seq(a, b, c, d) filter {
     |   var seq = Seq.empty[Foo]
     |   x => {
     |    if(seq.exists(customEquals(x, _))) {
     |      false 
     |    } else { 
     |      seq :+= x
     |      true 
     |    }
     | }
res13: Seq[Foo] = List(Foo(3,4), Foo(2,5))
missingfaktor
  • 90,905
  • 62
  • 285
  • 365
  • This answer is inspired by the implementation of `distinct`: https://lampsvn.epfl.ch/trac/scala/browser/scala/tags/R_2_9_1_final/src//library/scala/collection/SeqLike.scala#L501 So looking at the source code can sometimes help. – Christopher Chiche Apr 02 '12 at 19:32
  • 1
    No, this answer not inspired by that implementation. I wrote something that came to my mind. This is actually not even a good solution. @drstevens' answer is better. – missingfaktor Apr 02 '12 at 19:40
  • My bad, I looked for how it was implemented in scala, as I consider it as a good source of inspiration and then saw that it was the same form as your implementation. However I agree with you that a `foldleft` is more elegant. – Christopher Chiche Apr 02 '12 at 19:43
-1
case class Foo (a: Int, b: Int)

val x = List(Foo(3,4), Foo(3,1), Foo(2,5), Foo(2,5))
def customEquals(x : Foo, y: Foo) = (x.a == y.a && x.b == y.b)

x.foldLeft(Nil : List[Foo]) {(list, item) => 
   val exists = list.find(x => customEquals(item, x))
   if (exists.isEmpty) item :: list
   else list
 }.reverse

res0: List[Foo] = List(Foo(3,4), Foo(3,1), Foo(2,5))

RStrad
  • 152
  • 7