2

I am trying to construct a set of sets in the mathematical sense. However If I add another set which is equal to one that is already in the set of sets, but which is represented by a different object, the set gets duplicated.

HashSet<HashSet<string>> setOfSets = new HashSet<HashSet<string>>();
HashSet<string> set1 = new HashSet<string>();    
HashSet<string> set2 = new HashSet<string>();

set1.Add("Foo");
set1.Add("Bar");
set1.Add("Bar"); // set behavior okay, "Bar" is not duplicated

set2.Add("Foo");
set2.Add("Bar"); // now set1 == set2

setOfSets.Add(set1);
setOfSets.Add(set2); // now set1 AND set2 are in setOfSets, which is "wrong"

Why does the set-logic work for equality of strings but not for equality of HashSets themselves? How can I fix this with the least effort.

oliver
  • 2,771
  • 15
  • 32
  • 2
    The duplicate question answers your second question. The answer to your first question is: Hash sets use reference equality because they are mutable reference types; strings use value equality because, though reference types, they behave logically like immutable value types. – Eric Lippert Apr 17 '18 at 17:45
  • HashSet, like all collection classes, does not override GetHashCode+Equals. Too expensive to implement. So you only get a match for identical object references. Consider the Except() method as the setty way to check if one set has objects that another one doesn't have. – Hans Passant Apr 17 '18 at 17:45
  • It's not just that it's expensive, though it certainly is. Logically we would like things that are equal to be substitutable without changing the meaning of the program, but that's not true when you can mutate the containers. There is an argument to be made that immutable collection types should have value equality even when doing so is expensive. – Eric Lippert Apr 17 '18 at 17:48
  • 1
    I agree that this is not an exact duplicate. The answer you've constructed and added to the question is valuable. You should now be able to post it as an answer. The explanation that Eric Lippert provided deserves to be in an answer as well if you wish to include it. – Jeffrey L Whitledge Apr 17 '18 at 18:21

1 Answers1

2

I have been pointed to this question. Although the accepted answer is helpful for checking equality of sets by hand (by means of HashSet<T>.SetEquals(HashSet<T>)), it doesn't quite help with applying this equality logic to a set of sets.

However the non-accepted answer (by Gregory Adam) gives the crucial hint as to how this can be accomplished, namely with HashSet<string>.CreateSetComparer(). Because HashSet has a constructor that accepts an equality comparer, this is the way to go:

HashSet<HashSet<string>> setOfSets = 
    new HashSet<HashSet<string>>(HashSet<string>.CreateSetComparer());

This tells the outer HashSet how to "properly" (in the mathematical sense) compare objects the type of the inner HashSet (in my case HashSet<string>).

As Hans Passant and Eric Lippert have kindly pointed out, equality comparison of HashSets is comparatively expensive, even more so if it is applied to a nested HashSet, which might be one of the reasons why this hasn't been chosen as the default behavior.

The main reason however, according to Eric Lippert, for choosing reference equality is the mutability of the HashSet objects. Applied to my example: if I add two set-wise different HashSets (that is !set1.SetEquals(set2)) to my setOfSets and afterwards change set1's contents to become equal to set2, there are still two sets in setOfSets, although there should be only one set then. So this has led to a state which is inconsistent with the original requirement ("there shall not be two equal objects in a set").

The same cannot be done with strings because strings are immutable. If I add two different string objects "Foo" and "Bar" to a HashSet, there is no legal way to change them to become equal.

oliver
  • 2,771
  • 15
  • 32