5

Why does the Oracle Java API documentation for the add() method for TreeSet and HashSet state that:

an element e is added only if there is no e2 in the set where (e==null ? e2==null : e.equals(e2))

However, TreeSet uses compareTo(), while HashSet uses hashCode() to determine equality. Both ignore the value of equals(). I am concerned that the documentation is inaccurate, or is it my understanding of convention or the algorithm that is faulty?

anatolyg
  • 26,506
  • 9
  • 60
  • 134

2 Answers2

2

You are correct that the TreeSet documentation is incorrect.

You are incorrect about HashSet, as it does use equals(). hashCode() is not used for equality testing, only for fast searching.

jtahlborn
  • 52,909
  • 5
  • 76
  • 118
  • If hashCode() is set to always return the same code then the behaviour of add() does appear to depend upon equals(). But if I create an object where equals() always returns 'true' and don't override hashCode() - so it differs for different objects - then add() will add multiple objects. So the behaviour of add() seems more complex than as stated in the docs. As Louis says, these situations are outside of the contract, but I was interested in the implementation (as it wasn't doing what it said it would) - anyone know whether there is a canonical specification? – user3038094 Dec 02 '13 at 21:30
  • @user3038094 - i would recommend reading up on how HashMap works. then you will understand why the `equals()` method is not always called. regardless, in the general sense, HashSet _does_ use `equals()` for equality testing. – jtahlborn Dec 02 '13 at 21:40
  • Thanks jtahlborn - I have followed your advice and see what you mean. If the hashCode() is different then the algorithm does not bother evaluating equals(), it only uses equals() if hashCode() is the same. Presumably this is because hashCode() is assumed to be faster to evaluate than equals(), or is it to do with the fact that hashCode() will have to be evaluated? So it would be better to say in the docs: e==null ? e2==null : e.hashCode() == e2.hashCode() ? e.hashCode() == e2.hashCode() : e.equals(e2)==0 – user3038094 Dec 04 '13 at 11:46
  • Sorry - ran out of edit time: e==null ? e2==null : e.hashCode() == e2.hashCode() ? True : e.equals(e2) – user3038094 Dec 04 '13 at 11:53
1

TreeSet explains this in its doc:

Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.

For HashSet, it's an implicit expectation of the doc that the objects in the Set are correctly implemented; if hashCode() isn't correctly implemented, then it is not HashSet violating its spec but the objects being passed to it.

Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413
  • "The behavior of a set is well-defined even if its ordering is inconsistent with equals." However the behaviour is not as defined in the documentation, that is what I was wondering about. Why isn't the documentation specific (accurate) on this? It would be better to say: e==null ? e2==null : e.compareTo(e2)==0) – user3038094 Dec 02 '13 at 21:05