4

Usually I call distinct on List to remove duplicates or turn it into a Set. Now I have a List[MyObject]. MyObject is a case class, see below:

case class MyObject(s1: String, s2:String, s3:String) 

Let's say we have the following cases:

val myObj1 = MyObject("", "gmail,com", "some text")
val myObj2 = MyObject("", "gmail,com", "")
val myObj3 = MyObject("some text", "gmail.com", "")
val myObj4 = MyObject("some text", "gmail.com", "some text")
val myObj5 = MyObject("", "ymail.com", "")
val myObj6 = MyObject("", "ymail.com", "some text")

val myList = List(myObj1, myObj2, myObj3, myObj4, myObj5, myObj6)

Two Questions:

  1. How can I count how many objects are affected? Duplicates based on the content of s2?
  2. How can I make the List distinct based on s2? I would consider two case objects the same when s2 == s2. Do I need to turn the case class into a normal class and override equals? Do I need a my own Comparator for this or can I use some Scala API method to archive the same?
Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321
user3350744
  • 449
  • 1
  • 5
  • 12
  • 3
    These are two questions, and should be split that way. #1 is unclear to me. #2 is a duplicate of http://stackoverflow.com/questions/3912753/scala-remove-duplicates-in-list-of-objects – Michael Zajac Aug 11 '16 at 15:20
  • Question 1 means: How to see how many MyObject objects have the same content in s2 no matter what's in s1 or s3. I just care about s2 here. Question 2 means: I just want to keep a single MyObject with s2. I don't care which one. The resulting list should be distinct based on case class property s2. So, myList from above would just have 2 entries after transformation. – user3350744 Aug 11 '16 at 15:37

2 Answers2

6

How can I count how many objects are affected? Duplicates based on the content of s2?

If you want to count how many objects are in each duplicate group (if you only want to know how many objects are going to be removed, subtract 1 from size):

myList.groupBy(_.s2).map(x => (x._1, x._2.size))
res0: scala.collection.immutable.Map[String,Int] = Map(ymail.com -> 2, gmail.com -> 2, gmail,com -> 2)

How can I make the List distinct based on s2?

myList.groupBy(_.s2).map(_._2.head)
res1: scala.collection.immutable.Iterable[MyObject] = List(MyObject(,ymail.com,), MyObject(some text,gmail.com,), MyObject(,gmail,com,some text))
Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321
2

Here is a slightly safer way, myList.groupBy(_.s2).values.flatMap(_.headOption).toList

Alternatively, scala.reflect.internal.util.Collections.distinctBy(myList)(_.s2)

Abel Terefe
  • 1,440
  • 20
  • 17