2

Editing the question to remove my analogy and ask directly. JDK HashSet implementation is like below:

public class HashSet {
  private HashMap map;
  public HashSet(int capacity) {
    map = new HashMap(capacity);
  }

  public HashSet(int capacity, float loadFactor) {
    map = new HashMap(capacity, loadFactor);
  }

  HashSet(int capacity, float loadFactor, boolean dummy) {
    map = new LinkedHashMap<>(capacity, loadFactor);
  }
}

And LinkedHashSet is implemented like this:

public class LinkedHashSet
  extends HashSet {
  public LinkedHashSet(int capacity) {
    super(capacity, 0, true);
  }
  public LinkedHashSet(int capacity, float loadFactor) {
    super(capacity, loadFactor, true);
  }
}

The third constructor in class HashSet : HashSet(int capacity, loadFactor, boolean dummy) exists only to help LinkedHashSet class use a LinkedHashMap as the backing map instead of default HashMap

Questions:

  • Would this be considered a good design?
  • Wouldn't it be better to let the subclass specify the backing map type?
  • It took me 30 minutes to locate in JDK source code where exactly is the doubly linked list for LinkedHashSet because the above implementation paradigm did not intuitively occur to me, when would such a design be a good choice?
vsnyc
  • 2,117
  • 22
  • 35
  • Why would you want to do this? Seems like an awful lot of effort and confusion for no apparent reason. Most of the time, you shouldn't care about the implementation--program to interfaces. – Eric Hughes Mar 27 '15 at 20:02
  • So, your question is in fact not on the design of A and B, but on the design of HashSet and LinkedHashSet, right? – JB Nizet Mar 27 '15 at 20:03
  • This design is tightly coupled to these Map implementations. Why don't you just inject the map you want into the constructor of A, rather than having a cryptic set of constructors to choose from. – Andy Turner Mar 27 '15 at 20:03
  • @JBNizet Yes, I want to understand why HashSet and LinkedHashSet were implemented like this – vsnyc Mar 27 '15 at 20:04
  • @AndyTurner, that's my confusion as well. Such a design would be frowned upon, yet the JDK has implemented it so – vsnyc Mar 27 '15 at 20:05
  • @EricHughes, I would not do it like this myself, but the JDK has. In my example, Class A is an analogous to `HashSet` and Class B to LinkedHashSet – vsnyc Mar 27 '15 at 20:07
  • 1
    The java API is littered with bits of code which are now considered "bad design" (e.g. java.util.Date), but that can't be removed because they have been released as a public part of the API, so removing them could break an unknown amount of code. The lesson here is to learn from history, and do better next time. – Andy Turner Mar 27 '15 at 20:11
  • I don't get the question. How does `HashMap` do this? `HashMap` doesn't have a constructor with a dummy parameter. – Paul Boddington Mar 27 '15 at 20:15
  • 1
    @pbabcdefp the question is confusing. It's actually: why are HashSet and LinkedHashSet designed this way. A == HashSet. B == LinkedHashSet. – JB Nizet Mar 27 '15 at 20:16
  • @vsnyc one thing it ensures is that there is no way for any other subclass to construct a HashSet with another subclass of HashMap as backing map. Since the base class is the only one to intnatiate the backing map, it can, for example, implement readObject() as it does it (testing if the map is an instance of HashMap or LinkedHashMap, without caring about any other potential map type). Robustness through encapsulation was probably preferred over cleaner design and extensibility. – JB Nizet Mar 27 '15 at 20:24
  • @JBNizet, That is a good point, as `readObject()` method as it is implemented now will break for arbitrary HashMap types, but it would have been better to let subclasses override `readObject()` method than provide a implementation that will also work for a subclass. – vsnyc Mar 27 '15 at 20:43
  • @AndyTurner: That isn't really the case here. None of these constructors are actually exposed; the JDK could change them if it wanted to. This isn't really an API design question, it's an implementation design question, because these aren't actually part of the `HashSet` API. – Louis Wasserman Mar 27 '15 at 20:46
  • @LouisWasserman fully agreed. I have updated the title. I am trying to reason this implementation, hence the question. If I did it in one of my implementations, I'd receive critiques on such a choice – vsnyc Mar 27 '15 at 20:52

1 Answers1

2

You are correct that this wasn't the best design choice.

to summarize:

HashSet has 3 constructors:

/**
 * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
 * the specified initial capacity and the specified load factor.
 *
 * @param      initialCapacity   the initial capacity of the hash map
 * @param      loadFactor        the load factor of the hash map
 * @throws     IllegalArgumentException if the initial capacity is less
 *             than zero, or if the load factor is nonpositive
 */
public HashSet(int initialCapacity, float loadFactor) {
    map = new HashMap<>(initialCapacity, loadFactor);
}

/**
 * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
 * the specified initial capacity and default load factor (0.75).
 *
 * @param      initialCapacity   the initial capacity of the hash table
 * @throws     IllegalArgumentException if the initial capacity is less
 *             than zero
 */
public HashSet(int initialCapacity) {
    map = new HashMap<>(initialCapacity);
}

/**
 * Constructs a new, empty linked hash set.  (This package private
 * constructor is only used by LinkedHashSet.) The backing
 * HashMap instance is a LinkedHashMap with the specified initial
 * capacity and the specified load factor.
 *
 * @param      initialCapacity   the initial capacity of the hash map
 * @param      loadFactor        the load factor of the hash map
 * @param      dummy             ignored (distinguishes this
 *             constructor from other int, float constructor.)
 * @throws     IllegalArgumentException if the initial capacity is less
 *             than zero, or if the load factor is nonpositive
 */
HashSet(int initialCapacity, float loadFactor, boolean dummy) {
    map = new LinkedHashMap<>(initialCapacity, loadFactor);
}

The last one has an extra parameter dummy which is not used and is only there to let the compiler distinguish between the 2 constructors with the same parameters. The only difference is to change the backing map implementation.

We've gotten a lot better at Object design since these classes were written.

If this was rewritten today, there would probably a constructor that took a Map instead so that any Map implementation could be used as a backing store:

HashSet(Map<K,V> backingMap);

And/or there could be multiple static factory methods with different names but the same parameters to

 public static HashSet create(int initialCapacity, float loadFactor)

 public static HashSet createLinked(int initialCapacity, float loadFactor)

EDIT

JB Nizet brings up an interesting point. HashSet's readObject() method has explicit code when reconstructing an object from an ObjectInputStream to see if "this" the instance is of LinkedHashSet or not.

// Create backing HashMap
    map = (((HashSet<?>)this) instanceof LinkedHashSet ?
           new LinkedHashMap<E,Object>(capacity, loadFactor) :
           new HashMap<E,Object>(capacity, loadFactor));

This is only really possible because of the dummy parameter version of the constructor is package private so those are the only 2 possible implementations currently. Without this technique, you wouldn't be able to use ReadObject() correctly.

This is probably why Josh Bloch wrote the Serialization Chapter in Effective Java. You would probably have to use a SerializationProxy (Item 78) to correctly have the LinkedHashSet read correctly.

dkatzel
  • 31,188
  • 3
  • 63
  • 67
  • LinkedHashSet didn't show up until Java 1.4, rather than Java 1.2. – Powerlord Mar 27 '15 at 20:26
  • There _is_ a set-as-map "constructor" in `Collections.newSetFromMap`. That said, the JDK doesn't actually _expose_ any of these constructors, so it's not tied to this API in any way; this could change in future versions of the JDK if anyone wanted to change it. – Louis Wasserman Mar 27 '15 at 20:45
  • @dkatzel The answer is fair and I would side with it (because it reflects my point of view :)). I do wish to hear a counter argument why this was a good choice, so leaving it open for now – vsnyc Mar 27 '15 at 21:37
  • @LouisWasserman often I look at JDK source code to learn from it and I am generally happy with what I see. `LinkedHashSet` implementation just came off as odd. What if the take-away is: "it's good to do so because I saw it in the JDK?" – vsnyc Mar 27 '15 at 21:40