1

I'm a fresh man in coding and now learning the HashSet container in Java.What truly puzzled me is that the internal implementation of HashSet also create an private HashMap object to store it's value and a singleton object PRESENT.

So my question is:

  • Why would HashSet structure need an HashMap object to store it's value?(Why not using array structure or linked structure?)
  • What is the usage of the singleton object PRESENT?(Used to determine whether the insert is successful?)
Chopping
  • 321
  • 1
  • 2
  • 11

3 Answers3

7

A HashSet can be viewed as a private case of a HashMap, where we only care about the keys.

Using a HashMap instance as the HashSet implementation is a means to avoid code duplication. Instead of duplicating a significant portion of the HashMap code in the HashSet class (all the code that manages the array of buckets (including the linked list or tree structure within each bucket), and locates the bucket matching a given key), the JDK developers chose to re-use the HashMap code.

The PRESENT instance is a dummy instance used as a value in the backing HashMap of the HashSet. It is used to avoid allocating multiple dummy values.

This is states in a comment:

// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
Eran
  • 387,369
  • 54
  • 702
  • 768
2

Why would a HashSet structure need a HashMap object to store it's value? (Why not using array structure or linked structure?)

Technically, it doesn't need one.

However, it is much easier1 for the Java team to maintain a single implementation for something as complex as a HashMap / HashSet. (Note that the complexity is needed to allow the implementations to work well for a diverse range of use cases.)

There is an memory overhead of 1 reference per entry in implementing HashSet as a wrapper for HashMap. However, that is small enough that "they" deem it be acceptable. And if it is not acceptable to you then you are free to implement and maintain your own improved version of the HashSet class2.

What is the usage of the singleton object PRESENT?

The PRESENT instance is an implementation detail. It is the dummy value used as the value in the wrapped HashMap instance.

Used to determine whether the insert is successful?

In part, yes.


1 - It is revealing that HashSet / HashMap underwent a major overhaul recently to improve performance ... but the Java team didn't take that opportunity to separate the implementations.

2 - You won't be the first person to do that. However, you are likely to find that it is difficult to substantially improve HashSet performance across the board (i.e. for all use-cases) ... and still implement the java.util.Map API correctly. Factoring out the HashMap.Node classes value field is probably the only big win.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Thank u so much ! I think I should read some books about `Design Patterns` – Chopping Jun 14 '18 at 10:59
  • 2
    I don't actually think that is directly relevant to this question. In this case, the real answer boils down to business economics: there is insufficient benefit to justify someone (especially Oracle) spending the effort of developing and maintaining a new `HashSet` class that uses a bit less memory and is a tiny bit faster. If anything, this is an example of the DRY principle. – Stephen C Jun 14 '18 at 11:02
1

HashSet is called HashSet because it uses HashMap to do its work. HashMap is a very handy structure that allows you to find information related to some key very quickly, as long as that key has a nice hash function defined for it.

Trivially, if a set was implemented using linked lists, it would be called LinkedListSet and not HashSet, and it would be much, much slower. Ditto for arrays.

A PRESENT singleton is used simply because HashMap needs to store something; it does not matter what it is for purposes of HashSet as long as something is either there or not, so might as well always be the same thing.

Before Set came to JavaScript and Perl, you would see this pattern very often, where one would simply take an object (JS) or a hash (Perl) and stuff a true or 1 in it for every present member. So even without the dedicated HashMap object, the optimal solution was basically the same idea.

It would be somewhat more memory-efficient to implement the same functionality on a bit-vector, since the only values allowed are non-present or present, but it would involve more work and duplicating existing functionality. The part which finds which index of the array holds the value for which key would be the same, though.

Amadan
  • 191,408
  • 23
  • 240
  • 301