6

I know there are other questions about the general best practices while over riding hashCode and equals, but I have a very specific question.

I have a class that has as an instance variable, an array of the same class. To be more explicit, here's the code:

Class Node{
    Node arr[] = new Node[5];
}

I need to overwrite hashCode for the class Node, and the array is an important, deciding factor in determining whether two Nodes are the same. How can I incorporate the array into the calculation of hashCode effectively?

--Edit--

I'm trying to check if the two nodes are the same, meaning that they have the same number of children, and that those children lead to the exact same states. Therefore, I'm effectively trying to compare the subtrees at the two node. I'm wondering if I can use hashing to do this equality check.

I think I actually need to hash the entire subtree, but I'm not sure how I'd go about doing that given the recursive nature of my class definition.

efficiencyIsBliss
  • 3,043
  • 7
  • 38
  • 44

5 Answers5

4

Include http://download.oracle.com/javase/6/docs/api/java/util/Arrays.html#hashCode(java.lang.Object[]) as part of the hashCode() implementation.

Joseph Ottinger
  • 4,911
  • 1
  • 22
  • 23
  • I just edited the question to more accurately reflect my needs. Given that I want to check for the equality of the subtrees, I don't think I can use the method you are referencing. There was another method called deepHashCode() mentioned in see also, but the description said that it cannot be used on an array that contained itself, though I'm not sure if my array 'contains itself'. – efficiencyIsBliss May 04 '11 at 17:48
  • @efficiencylsBliss If you can't guarantee that there is no recursive/cyclic references between your nodes, then you take the risk of infinite loops. Delivered Java methods don't check for these situations. – Jérôme Verstrynge May 04 '11 at 18:10
2

I'm trying to check if the two nodes are the same, meaning that they have the same number of children, and that those children lead to the exact same states. Therefore, I'm effectively trying to compare the subtrees at the two node. I'm wondering if I can use hashing to do this equality check.

No, hashing should not be used to check equality. That's not its purpose. It can eventually help you find out if objects are not equal, but it won't tell anything you if they are equal.

Same objects will generate same hash value, but two different objects which are not equal can generate the same hash too. In other words, if hash values are different, you know for sure that objects are different. That's it.

If you want to test equality, you need to implement equals. In your case, there is a danger that your method will go recursive and provoke a stack overflow. What if your object contains a reference to itself?

If you want to generate a hash, you could take the size of the array into account (and the fact that it is null or not), but I would not try to use the hash value of the objects in the array, because of potential infinite loops. It is not perfect, but it is good enough.

There is another radical method which may provide good result too. Instead of computing hash values dynamically, set a random int value for each Node object instance (I mean once for all at creation and always return that value). In your case, you would not risk infinite loops by taking the hash value of the object instances in your array.

If hashes are equals, then you would need to start comparing array object instances.

REM: If Nodes contain other attributes, then compute hash on these other attributes and forget about the array. Start investigating about array content/size if and only if hash are identical between two objects.

REM2: Comments mentions DAG graph, which means we won't hit recursivity issues. However, that condition is not enough that guarantee that deepHashCode() will succeed. Moreover, it would be overkill too. There is a more efficient way to solve this issue.

If the hash method used by Node only uses the array to compute a hash value, then deepHashCode() may work. But it would not be efficient. If the hash method uses other node attributes, then these attributes would have to be equal too.

There is a faster way to compare nodes for equality. Mark each node instance with a unique number. Then, to compare two nodes, compare their array size first. If it is equals, then compare nodes from each array using their unique number. If one array does not 'have' the other node, then we are not dealing with equal nodes. This solution is much faster than going recursive.

Jérôme Verstrynge
  • 57,710
  • 92
  • 283
  • 453
  • I disagree that hashing should not be used for equality checks. Indeed, the Javadoc on hashing clearly states that if two objects are equal according to the equals() method, then they must hash to the same thing. – efficiencyIsBliss May 04 '11 at 18:10
  • @efficiencyIsBliss - the thing is, [two unequal objects can *also* have the same hashCode](http://download.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode()). – justkt May 04 '11 at 18:23
  • 1
    @efficiencyIsBliss I am not going against what you say. In fact, we are saying the same thing: hashing can SUPPORT the process to find whether two objects are equals, BUT it cannot be a substitute. – Jérôme Verstrynge May 04 '11 at 18:26
1

It depends on what your criteria for equality are. Is the order in the array important? If so, you'll probably want to make the hash code depend on the order of the nodes in the array. If not, you may want to do something like XOR-ing the hash codes of all the nodes within the array. Presumably some of the values may be null (so be careful of that).

Basically, you need to override hashCode and equals consistently such that if two objects are equal, they will have the same hash code. That's the golden rule.

Eric Lippert has a great blog post about GetHashCode in .NET - the advice applies equally well for Java.

One potential problem to be aware of - if you end up with a cycle in your nodes (a reference to node A appearing in the array of node B and vice versa) you could end up having a cycle in the hash code calculation too.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
1

You can use Arrays.hashCode() and Arrays.equals() methods.

Yasin Bahtiyar
  • 2,357
  • 3
  • 18
  • 18
0

A few of points to add to the current answers, if performance is of any concern.

First, you need to decide whether order of the child nodes in a node matter. If they don't, you cannot use the hashcode for an array. Consider fashioning your hashcode function around that defined by java.util.Set. Also consider using some ordering internally to improve equals performance. For example, if the subtrees' depths/heights differ, you can sort by depth.

Second, if your subtrees are deep, your hashcode can get very expensive. So I would cache the hashcode, and compute it upon construction (if your node is immutable), or invalidate upon mutation and re-calculate on demand.

Third, if your subtrees are deep, check the hashcode in equals() and return false early. Yes, hashcode is inspected by Map implementations, but there are places where code simply compares two objects using equals(), and they may pay a big price.

Finally, consider using Arrays.asList() (if child ordering matters) or HashSet (if ordering doesn't matter and no two child nodes are equal) instead of a simple array. Then equals and hashcode are reduced to delegating the call to the container instance... with appropriate caching of hashcode, of course.

Dilum Ranatunga
  • 13,254
  • 3
  • 41
  • 52