9

I just stumbled on one of Tony Morris' blog-posts about Java and a fundamental problem with the language: that of defining a bespoke equality-relation for a collection. This is something that I think is a big deal and wondered whether there was some scala solution.

The classic issue manifests itself in thinking about, say, a trade. Let's say I make two trades of +100 vodafone shares @150p. The two trades are equal, yes? Except they are not the same trade. In the case of a normal real-world system, with persistence or serialization, I cannot rely on identity to tell me whether two references are to the same trade!

So what I want is to be able to create a collection which I can pass an Equality-relation to:

val as = CleverSet[Trade](IdEquality)
val bs = CleverSet[Trade](EconomicsEquality)

How would I implement my set in an efficient manner (unless the EqualityRelation also defines a hash mechanism)?

trait EqualityRelation[T] {
  def equal(t1: T, t2: T) : Boolean
  def hash(t: T) : Int
}

So the questions are:

  • Is there a library which provides this ability?
  • Is there some way of doing this neatly in Scala?

It seems that with implicits, it would be quite an easy thing to add to the existing scala Set type.

Community
  • 1
  • 1
oxbow_lakes
  • 133,303
  • 56
  • 317
  • 449
  • There's Equal in Scalaz: http://github.com/scalaz/scalaz/blob/master/example/src/main/scala/scalaz/ExampleEqual.scala. But I'm not familiar enough to say what builds on it. – Thomas Jung Feb 26 '10 at 13:23
  • I think that's just a typesafe equals, so that `"Hello" === 2` does not compile – oxbow_lakes Feb 26 '10 at 16:20
  • scalaz.Equal is not just type safe, it's also a flexible. `Equal[List[Foo]]]` is parameterisable by an `Equal[Foo]`. This goes half-way towards your goal. Martin Odersky declined to add `Hash[T]` to the standard library, saying that "we want to maintain universal hashing, it's too much part of the Java culture." http://www.scala-lang.org/node/4091#comment-16327 – retronym Feb 26 '10 at 18:03

3 Answers3

7

This can already be achieved with Java's TreeSet and a Comparator implementation:

TreeSet<String> ignoreCase = new TreeSet<String>(new Comparator<String>(){
    @Override
    public int compare(String o1, String o2) {
        return o1.compareToIgnoreCase(o2);
    }});

TreeSet<String> withCase = new TreeSet<String>();

List<String> values = asList("A", "a");
ignoreCase.addAll(values);
withCase.addAll(values);

Output:

ignoreCase -> [A]
withCase -> [A, a]

This has the drawbacks that the Comparator to implement is more powerful than needed and that you're restricted to collections that support Comparators. As pointed out by oxbow_lakes the Comparator implementation breaks the Set contract (for !a.equals(b) it could be that new Set(); set.add(a) == true && set.add(b) == false).

Scala supports this with a view transformation from A => Ordered[A].

scala> new scala.collection.immutable.TreeSet[String]()(x=> x.toLowerCase) + "a"
 + "A"
res0: scala.collection.immutable.TreeSet[String] = Set(A)
Thomas Jung
  • 32,428
  • 9
  • 84
  • 114
  • 1
    And you are forced down the route of the O(log(n)) access time of a tree structure – oxbow_lakes Feb 26 '10 at 14:38
  • Also, from the JavaDoc of `TreeSet`: *Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface*, so it seems that whilst this approach works, it doesn't quite feel right – oxbow_lakes Feb 26 '10 at 14:52
  • This is a valid point. Breaking the Set contract is a bad idea. If you hand this TreeSet around it has to be wrapped. The CleverSet you suggested would break it in the same way. It could only implement Iterable[T] not Set[T]. – Thomas Jung Feb 26 '10 at 15:35
  • only because `Set` is defined in terms of equal. Oh, how I wish it had been defined in terms of a (pluggable) equality relation – oxbow_lakes Feb 26 '10 at 15:58
  • @oxbow_lakes - If you go down this road (flexibility) there will only be getClass left on the java.lang.Object interface. – Thomas Jung Feb 26 '10 at 16:07
  • @ThomasJung Sounds like an improvement ;) – vossad01 Dec 12 '16 at 00:36
3

I know you're asking about Scala, but it's worth comparing with what the .Net collections offer. In particular, all Hash-based collections (eg Dictionary<TKey, TValue> and HashSet<T>) can take an instance of IEqualityComparer<T>. This is similar to Scala's Equiv[T], but also supplies a custom hash code. You could create a similar trait by subclassing Equiv:

trait HashEquiv[T] extends Equiv[T] {
  def hashOf(t: T) : Int
}

To be fully supported, hash based collections would need to add HashEquiv implicit parameters to their construction and use the implicitly imported equiv and hashOf methods instead of the Object instance methods (like TreeSet, etc do with the Ordered trait, but in reverse). There would also need to be an implicit conversion from Any to HashEquiv that uses the intrinsic equals and hashCode implementation.

Ben Lings
  • 28,823
  • 13
  • 72
  • 81
2

You're describing the concept of a hashing strategy. The Trove library includes sets and maps that can be constructed with hashing strategies.

Craig P. Motlin
  • 26,452
  • 17
  • 99
  • 126