0

Using below code I'm attempting to remove duplicate elements using the distinct method on List :

class OrderDetSpecific(var size: Double,
                       var side: String,
                       var trade_id: Int,
                       var price: Double,
                       var time: String,
                       var code : String) {

  override def toString : String = {
    this.code+","+this.time+","+this.trade_id+","+this.price
  }

}


val l = List(new OrderDetSpecific(1.0 , "1" , 1 , 1.0 , "10" , "a"),new OrderDetSpecific(1.0 , "1" , 1 , 1.0 , "10" , "a"))

println(l.size)
println(l.distinct.size)

returns :

defined class OrderDetSpecific

l: List[OrderDetSpecific] = List(a,10,1,1.0, a,10,1,1.0)

2
2

But as can see the duplicated elements are not being removed. Overriding the toString method is utilized as a part of discovering duplicates and as the duplicates entries exist then the List l should be of size 1 instead of 2 as a result of calling l.distinct.size?

Update :

Converting to case class :

case class OrderDetSpecific(var size: Double,
                       var side: String,
                       var trade_id: Int,
                       var price: Double,
                       var time: String,
                       var code : String) {

  override def toString : String = {
    this.code+","+this.time+","+this.trade_id+","+this.price
  }

}


val l = List(new OrderDetSpecific(1.0 , "1" , 1 , 1.0 , "10" , "a"),new OrderDetSpecific(1.0 , "1" , 1 , 1.0 , "10" , "a"))

println(l.size)
println(l.distinct.size)

now the duplicated elements are removed. Does a case class use value equality on equals which allows distinct to behave as expected ?

When I override equals :

class OrderDetSpecific(var size: Double,
                       var side: String,
                       var trade_id: Int,
                       var price: Double,
                       var time: String,
                       var code : String) {

  override def toString : String = {
    this.code+","+this.time+","+this.trade_id+","+this.price
  }

  override def equals(that: Any): Boolean =
    that match {
      case that: OrderDetSpecific => {
        time == that.time
      }
      case _ => false
    }

}


val l = List(new OrderDetSpecific(1.0 , "1" , 1 , 1.0 , "10" , "a"),new OrderDetSpecific(1.0 , "1" , 1 , 1.0 , "10" , "a"))

println(l.size)
println(l.distinct.size)

The distinct element is not removed. As my override is on time attribute which is not distinct shouldnt the duplicated element be removed ?

Mario Galic
  • 47,285
  • 6
  • 56
  • 98
blue-sky
  • 51,962
  • 152
  • 427
  • 752
  • 5
    `distinct` internally uses `equals` which for normal classes uses **reference equality** instead of **value equality**, so all those elements are different regardless they have the same values _(also given all those values are mutable, they having the same values is just a coincidence that may be changed later)_. - So it is behaving as expected. If you want, you could either make all those values immutable and use a **case class** instead, or you can override `equals` yourself _(but remember that for mutable values, **reference equality** is the only "sane" implementation)_. – Luis Miguel Mejía Suárez Jan 05 '20 at 01:27
  • it's not evil to just add `case` for a holder of vars in a list. But prefer immutable and careful with Maps and Sets. – som-snytt Jan 05 '20 at 04:11
  • 1
    @texasbruce probably he was talking about "human readability" to verify the output. – sarveshseri Jan 05 '20 at 13:56
  • @Luis Miguel Mejía Suárez thanks, please see question update. Why is it " for mutable values, reference equality is the only "sane" implementation " ? – blue-sky Jan 05 '20 at 14:42
  • @blue-sky a **case class** with mutable values is a code smell. For your own equals, probably you also need to override `canEquals` & `hashcode`, you can search about that by googling _"effective Java custom equals"_, or that is why in **Scala** we usually do not worry about that and just use `case classes`. Finally, for mutable values one may argue that they are only equals if they are exactly the same instance. **value equality** for something whose values can change doesn't make sense. – Luis Miguel Mejía Suárez Jan 05 '20 at 15:15

1 Answers1

3

As my override is on time attribute which is not distinct shouldnt the duplicated element be removed ?

No, because distinct is equivalent to distinctBy(identity) and distinctBy uses HashSet which uses hashCode to eliminate duplicates, however you have not provided an override for hashCode. For example, without hashCode override

class Foo(var x: Int) {
  override def equals(obj: Any): Boolean = true
}

val a = new Foo(42)
val b = new Foo(42)

a.## == b.##
mutable.HashSet(a, b).size == 1

outputs

res0: Boolean = false
res1: Boolean = false

whilst with hashCode override provided

class Foo(var x: Int) {
  override def equals(obj: Any): Boolean = true
  override def hashCode(): Int = scala.runtime.Statics.anyHash(x)
}
...

we get

res0: Boolean = true
res1: Boolean = true

However, there is no need to fiddle with these overrides, instead try

l.distinctBy(_.time)

which outputs

res0: List[OrderDetSpecific] = List(a,10,1,1.0)
Mario Galic
  • 47,285
  • 6
  • 56
  • 98