Iteration through HashSet vs a Linked Hashset

Question

I am thinking about ways to represent a graph in memory?

I was thinking to use a hash maps of hash maps so that it will behave similar to an adjacency matrix, but we can use Comparable edge labels instead of just integers.

In Breadth First Search and Dijkstra's algorithms, we have to iterate through adjacency lists and add nodes to the queue. This leads to my question:

Is iteration through a linked hash set more efficient than iteration through a regular HashSet in Java?

It seems like it would be because there are links between each node in the order that they were added so we do not have to iterate through empty bins if they exist (depending on the re-hashing ratio of the HashMap, this could be more or less). This would allow us to combine the random access behavior of the adjacency matrix with the search algorithm efficiency of the adjacency list.

*FYI:* You are essentially asking for the better data structures for given algorithms, thus the question may fit better in [Computer Science](https://cs.stackexchange.com/). — akuzminykh, Apr 02 '20 at 23:50
I'm asking about how certain Java data structures work, not about which one works better for a particular algorithm. So I think it fits here. I am just giving context as to why I would need it. — mk3009hppw, Apr 02 '20 at 23:53

score 1 · Answer 1 · answered Apr 03 '20 at 00:02

Yes, you're right.

Java HashMap/Set has poor performance in sparse graph, because it must iterator from empty bins. When most of nodes only connected one other node. The HashMap/Set maybe take 8 iteration to get the exact result. The related analysis could be found in Codeforces: Performance of hash set iterators in different programming languages.

In order to represent graph, Java generic mechanism may let you create Object type for primitive type, like Integer. They will slow down performance when convert to primitive type and build graph. In best practice, you need to use Trove or other library. Perhaps, implement by yourself.

From my experience the LinkedHashSet is about twice as fast as the normal HashSet (depends on the load factor of course). Remember that HashSets in Java don't decrease their capacity if you delete elements. Therefore the normal HashSet is asymptotically worse: Iteration takes O(capacity), instead of O(size). Unless you delete elements though, capacity and size are roughly proportional. The Graph implementation of guava also uses LinkedHashSets is values in a adjacency-map btw. — Moritz Groß, Apr 03 '20 at 21:57

Iteration through HashSet vs a Linked Hashset

1 Answers1