4

I'm curious, is there any Set that only requires .equals() to determine the uniqueness?

When looking at Set classes from java.util, I can only find HashSet which needs .hashCode() and TreeSet (or generally SortedSet) which requires Comparator. I cannot find any class that use only .equals().

Does it make sense that if I have .equals() method, it is sufficient to use it to determine object uniqueness? Thus have a Set implementation that only need to use .equals()? Or did I miss something here that .equals() are not sufficient to determine object uniqueness in Set implementation?

Note that I am aware of Java practice that if we override .equals(), we should override .hashCode() as well to maintain contract defined in Object.

tkokasih
  • 1,117
  • 9
  • 19
  • The last note actually raise another question why `.hashCode()` is defined in `Object`. Is it because all objects in Java will somehow ends-up is a hash-based data structure? but this is answered in [another question](http://stackoverflow.com/questions/8113752/why-equals-and-hashcode-were-defined-in-object) and also perhaps in [Jon Skeet blog](http://codeblog.jonskeet.uk/2008/12/05/redesigning-system-object-java-lang-object/) – tkokasih Apr 27 '15 at 02:03
  • It's worth noting that java.util.HashSet is backed by a java.util.HashMap – djeikyb Apr 27 '15 at 02:06

3 Answers3

5

On its own, the equals method is perfectly sufficient to implement a set correctly, but not to implement it efficiently.

The point of a hash code or a comparator is that they provide ways to arrange objects in some ordered structure (a hash table or a tree) which allows for fast finding of objects. If you have only the equals method for comparing pairs of objects, you can't arrange the objects in any meaningful or clever order; you have only a loose jumble of objects.

For example, with only the equals method, ensuring that objects in a set are unique requires comparing each added object to every other object in the jumble. Adding n objects requires
n * (n - 1) / 2 comparisons. For 5 objects that's 10 comparisons, which is fine, but for 1,000 objects that's 499,500 comparisons. It scales terribly.

Because it would not give scalable performance, no such set implementation is in the standard library.


If you don't care about hash table performance, this is a minimal implementation of the hashCode method which works for any class:

@Override
public int hashCode() {
    return 0; // or any other constant
}

Although it is required that equal objects have equal hash codes, it is never required for correctness that inequal objects have inequal hash codes, so returning a constant is legal. If you put these objects in a HashSet or use them as HashMap keys, they will end up in a jumble in a single hash table bucket. Performance will be bad, but it will work correctly.


Also, for what it's worth, a minimal working Set implementation which only ever uses the equals method would be:

public class ArraySet<E> extends AbstractSet<E> {
    private final ArrayList<E> list = new ArrayList<>();

    @Override
    public boolean add(E e) {
        if (!list.contains(e)) {
            list.add(e);
            return true;
        }
        return false;
    }

    @Override
    public Iterator<E> iterator() {
        return list.iterator();
    }

    @Override
    public int size() {
        return list.size();
    }
}

The set stores objects in an ArrayList, and uses list.contains to call equals on objects. Inherited methods from AbstractSet and AbstractCollection provide the bulk of the functionality of the Set interface; for example its remove method gets implemented via the list iterator's remove method. Each operation to add or remove an object or test an object's membership does a comparison against every object in the set, so it scales terribly, but works correctly.

Is this useful? Maybe, in certain special cases. For sets that are known to be very tiny, the performance might be fine, and if you have millions of these sets, this could save memory compared to a HashSet.

In general, though, it is better to write meaningful hash code methods and comparators, so you can have sets and maps that scale efficiently.

Boann
  • 48,794
  • 16
  • 117
  • 146
  • yes, I'm aware that O(n) in such critical data structure is not acceptable. I can imagine the `Set` will be part of a loop which can put the performance to O(n^2). But OTOH, I wonder whether some correctness mistake can be prevented by providing such implementation. See my comment to Ekleog answer. – tkokasih Apr 27 '15 at 03:14
  • @wannabeprogrammer I've added an example of how to implement such a set. – Boann Apr 27 '15 at 03:18
  • Thank you. Although I'm not looking for workaround, just a rationale on why there is no such implementation in the standard API. Because I think, such data structure might prevent errors on programmer part that doesn't override hashCode when they override equals. I'll accept your answer because you answered my question that 1) I don't miss anything that `.equals()` is sufficient to build a `Set` data structure and 2) possible reason why there is no such implementation in standard API : performance. thank you :) – tkokasih Apr 28 '15 at 01:16
3

You should always override hashCode() when you override equals(). The contract for Object clearly specifies that two equal objects have identical hash codes, and a surprising number of data structures and algorithms depend on this behavior. It's not difficult to add a hashCode(), and if you skip it now, you'll eventually get hard-to-diagnose bugs when your objects start getting put in hash-based structures.

chrylis -cautiouslyoptimistic-
  • 75,269
  • 21
  • 115
  • 152
2

It would mathematically make sense to have a set that requires nothing but .equals().

But such an implementation would be so slow (linear time for every operation) that it has been decided that you can always give a hint.

Anyway, if there is really no way you can write a hashCode(), just make it always return 0 and you will have a structure that is as slow as the one you hoped for!

Ekleog
  • 1,054
  • 7
  • 19
  • Indeed. I am aware of the O(n) performance. But OTOH, our inherited codebase seems to suffer from identity problem whereby many classes override `.equals()` and not `.hashCode()`, and worse, these classes are used in hash-based data structure. Thus I'm wondering whether the former programmer line of thought is something like, "I need Set data structure. TreeSet will need to implements Comparator. Nah, just use HashSet". Also, whether these kind of error can be prevented by having the `.equals()`-only-implementation of `Set`. It will cost O(n), but correct instead of O(1) but incorrect. – tkokasih Apr 27 '15 at 03:06
  • 2
    @wannabeprogrammer why not fix your codebase straight away? Why some awkward workaround that is only going to lead to more trouble in the end? – MarioDS Apr 27 '15 at 08:25
  • Especially given that writing a `hashCode` function is really easy so long as you do not seek for optimal performance, and examples on the internet can help you write efficient ones. – Ekleog Apr 27 '15 at 11:39
  • @MDeSchaepmeester: Yep, I already raised this to my senior dev and we are trying to get necessary resources, mainly time from the stakeholder because there are hundreds of them. And no, I don't ask the question because I'm looking for workaround, I'm just curious on rationale why this errors can be so pervasive. Design of Object that force on equality and hash code might be one reason (ref)[http://codeblog.jonskeet.uk/2008/12/05/redesigning-system-object-java-lang-object/] and, I think, lack of the `.equals()` only in `Set` might also contribute. Thus the question why it's not there. – tkokasih Apr 28 '15 at 01:07
  • @Ekleog, Indeed, we are planning to use Eclipse's equals and hashCode generator to expedite our effort. – tkokasih Apr 28 '15 at 01:09