4

In this situation where one member is edited to become equal to another, what is the proper way to force the HashSet to recalculate hashes and thereby purge itself of duplicates?

I knew better than to expect this to happen automatically, so I tried such things as intersecting the HashSet with itself, then reassigning it to a constructor call which refers to itself and the same EqualityComparer. I thought for sure the latter would work, but no.

One thing which does succeed is reconstructing the HashSet from its conversion to some other container type such as List, rather than directly from itself.

Class defs:

public class Test {
    public int N;
    public override string ToString() { return this.N.ToString(); }
    }
public class TestClassEquality: IEqualityComparer<Test> {
    public bool Equals(Test x, Test y) { return x.N == y.N; }
    public int GetHashCode(Test obj) { return obj.N.GetHashCode(); }
    }

Test code:

    TestClassEquality eq = new TestClassEquality();
    HashSet<Test> hs = new HashSet<Test>(eq);
    Test a = new Test { N = 1 }, b = new Test { N = 2 };
    hs.Add(a);
    hs.Add(b);
    b.N = 1;
    string fmt = "Count = {0}; Values = {1}";
    Console.WriteLine(fmt, hs.Count, string.Join(",", hs));
    hs.IntersectWith(hs);
    Console.WriteLine(fmt, hs.Count, string.Join(",", hs));
    hs = new HashSet<Test>(hs, eq);
    Console.WriteLine(fmt, hs.Count, string.Join(",", hs));
    hs = new HashSet<Test>(new List<Test>(hs), eq);
    Console.WriteLine(fmt, hs.Count, string.Join(",", hs));

Output:

"Count: 2; Values: 1,1"
"Count: 2; Values: 1,1"
"Count: 2; Values: 1,1"
"Count: 1; Values: 1"

Based on the final approach succeeding, I could probably create an extension method in which the HashSet dumps itself into a local List, clears itself, and then repopulates from said list.

Is that really necessary or is there some simpler way to do this?

Mureinik
  • 297,002
  • 52
  • 306
  • 350
  • 5
    The problem here is that you're doing something that is explicitly not supposed to happen, keys used for hashing and dictionaries are required to not change. Therefore nobody has made it easy to handle this situation. – Lasse V. Karlsen Jun 08 '18 at 17:44

3 Answers3

11

Lasse's comment is correct: you are required by the contract of HashSet to not do this, so asking what to do when you do this is a non-starter. If it hurts when you do that, stop doing that. A mutable object must not be put into a hash set if a mutation will cause its hash value to change while it is in the set. You're in a cleft stick of your own making.

To get out of that cleft stick, you could:

  • Stop mutating the objects while they are in a hash set. Remove them before you mutate them, put them back in later.
  • Fix the implementation of equality and hashing on the object so that it is consistent across mutations.
  • When you create the hash set, provide a custom hashing/equality algorithm that does not change its opinions when the object is mutated.
  • Implement your own "set" class that has whatever behaviour you like in this scenario. That is extremely difficult, so be careful. (There is a reason why this restriction was created in the first place!)
Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • Thanks. The TLDR non-MVCE version is that I originally had a Dictionary where the key string was also the "Name" property of Foo -- but then realized that was redundant and complicated the unforeseen need to rename said Foo from the calling environment. So I switched to a HashSet with equality based on said Name and had the above problems. Current version uses a wrapper class with a private List and accessor functions for the desired by-name lookup, renaming, and avoidance of duplicates. Count is expected to remain too low for linear lookup efficiency to be a concern. – Custer Barnes Jun 10 '18 at 22:11
3

There is no other way than recreating the HashSet<>. Sadly the HashSet<> constructor has an optimization so that if it is create from another HashSet<> it copies the hash codes... So we can cheat:

hs = new HashSet<Test>(hs.Skip(0), eq);

The hs.Skip(0) is a IEnumerable<>, not an HashSet<>. This defeats the HashSet<> check.

Note that there is no guarantee that in the future the Skip() won't implement a shortcircuit in case of 0, something like:

if (count == 0)
{
    return enu;
}
else
{
    return count elements;
}

(see Lippert's comment, false problem)

The "manual" method to do it is:

var hs2 = new HashSet<Test>(eq);
foreach (var value in hs)
{
    hs2.Add(value);
}
hs = hs2;

So enumerate "manually" and readd.

xanatos
  • 109,618
  • 12
  • 197
  • 280
  • The other thing I just discovered is that using a different instance of the same EqualityComparer class will _also_ avoid this optimization. `hs = new HashSet(hs, eq2);` – Custer Barnes Jun 08 '18 at 17:41
  • @CusterBarnes Yes. See [this](https://github.com/dotnet/corefx/blob/master/src/System.Collections/src/System/Collections/Generic/HashSet.cs#L125) line of the source of `HashSet<>` – xanatos Jun 08 '18 at 17:43
  • 2
    I like that you're thinking carefully about this, but you don't actually have to worry about it. The standard implementation of `Skip(0)`, `Select(x=>x)` and so on are *required* to not be reference-equals to the underlying collection, so you don't have to worry that they will change. LINQ was designed so that you cannot "cast away" a query to get back to the original object; the developer may be attempting to conceal an underlying mutable object from its consumer, that needs an immutable view on the collection. – Eric Lippert Jun 08 '18 at 18:34
2

As you saw, HashSets don't deal with mutable objects when modifying the object affects its hash code or equality to other objects. Just remove it and re-add it:

hs.Remove(b);
b.N = 1;
hs.Add(b);
Mureinik
  • 297,002
  • 52
  • 306
  • 350
  • 2
    Note that you have to do the removal *before* the mutation, as you've done here. If the object is mutated such that the hash is different then *you can't remove it*. That's the whole reason why this is illegal in the first place! – Eric Lippert Jun 08 '18 at 18:35