8

I have a cyclic graph-like structure that is represented by Node objects. A Node is either a scalar value (leaf) or a list of n>=1 Nodes (inner node).

Because of the possible circular references, I cannot simply use a recursive HashCode() function, that combines the HashCode() of all child nodes: It would end up in an infinite recursion.

While the HashCode() part seems at least to be doable by flagging and ignoring already visited nodes, I'm having some troubles to think of a working and efficient algorithm for Equals().

To my surprise I did not find any useful information about this, but I'm sure many smart people have thought about good ways to solve these problems...right?

Example (python):

A = [ 1, 2, None ]; A[2] = A  
B = [ 1, 2, None ]; B[2] = B

A is equal to B, because it represents exactly the same graph.

BTW. This question is not targeted to any specific language, but implementing hashCode() and equals() for the described Node object in Java would be a good practical example.

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
mfya
  • 352
  • 4
  • 9

3 Answers3

0

I would like to know a good answer as well. So far I use a solution based on the visited set.

When computing hash, I traverse the graph structure and I keep a set of visited nodes. I do not enter the same node twice. When I hit an already visited node, the hash returns a number without recursion.

This work even for the equality comparison. I compare node data and recursively invoke on the children. When I hit an already visited node, the comparison returns true without recursion.

jk_
  • 5,448
  • 5
  • 25
  • 23
-1

If you think about this as graph, a leaf node is a node that has only one connection and a complex node is one with at least 2. So one you got it that way, implement a simple BFS algorithm witch applies the hash function to each node it passes and then drops the result. This way you ensure yourself that you wont fall in cicles or go through any node more than once.

The implementation could be very straihgtforward. Read about it here.

guiman
  • 1,334
  • 8
  • 13
  • Thanks for your response, however this doesn't really help me: – mfya Dec 27 '10 at 19:02
  • I can't calculate the hash of a node before i know the hash of all it's children. With a BFS or DFS I can avoid infinite recursion, but the hash i get may not be a good one if there are cycles: for example, when i ignore the nodes i see a second time, the hash of a node would be the same as of a node without the cycle. I was looking for some hashing algorithm that makes sense and doesn't introduce unnecessary collisions (if that exists for this case). – mfya Dec 27 '10 at 19:09
-1

You need to walk the graphs.

Here's a question: are these graphs equal?

A = [1,2,None]; A[2] = A
B = [1,2,[1,2,None]]; B[2][2] = B

If so, you need a set of (Node, Node) tuples. Use this set to catch loops, and return 'true' when you catch a loop.

If not, you can be a bit more efficient, and use a map from Node to Node. Then, when walking the graphs, build up a set of correspondances. (In the case above, A would correspond to B, A[2] would correspond to B[2], &c.) Then when you visit a node pair you make sure the exact pair is in the map; if it isn't the graph doesn't match.

John Doty
  • 3,193
  • 1
  • 16
  • 10