0

Let's say I have this example:

public class Player
{
    public string Username { get; set; }
    
    private sealed class PlayerEqualityComparer : IEqualityComparer<Player>
    {
        public bool Equals(Player x, Player y)
        {
            if (ReferenceEquals(x, y)) return true;
            if (ReferenceEquals(x, null)) return false;
            if (ReferenceEquals(y, null)) return false;
            if (x.GetType() != y.GetType()) return false;
            return x.Username == y.Username;
        }

        public int GetHashCode(Player obj)
        {
            return (obj.Username != null ? obj.Username.GetHashCode() : 0);
        }
    }

    public static IEqualityComparer<Player> Comparer { get; } = new PlayerEqualityComparer();
 }

I have a doubt about GetHashCode: its returned value depends on the hash of Username but we know that even if two strings contain the same value, their hash is computed by their reference, generating a different Hash.

Now if I have two Players like this:

Player player1 = new Player {Username = "John"};
Player player2 = new Player {Username = "John"};

By Equals they're the same, but by GetHashCode they are likely not. What happens when I use this PlayerEqualityComparer in a Except or Distinct method then? Thank you

Alessandro
  • 97
  • 6
  • 2
    `if two strings contain the same value, their hash is computed by their reference, generating a different Hash`, no, strings with same values have same hashes by GetHashCode – Renat May 31 '23 at 07:31
  • If something you "know" would lead to obvious problems (if strings worked how you thought, they could never be the keys in `Dictionary`), I'd strongly recommend you recheck your assumptions first. – Damien_The_Unbeliever May 31 '23 at 07:39

2 Answers2

3

Of course it is guaranteed that two strings with the same "value" have the same hashcode, otherwise string.GetHashCode was broken and someone would have noticed it already.

but we know that even if two strings contain the same value, their hash is computed by their reference

I don't understand what you mean here, but it's wrong. The hashcode is derived from the string itself, so the "value". The documentation states:

Important

If two string objects are equal, the GetHashCode method returns identical values. However, there is not a unique hash code value for each unique string value. Different strings can return the same hash code.

The hash code itself is not guaranteed to be stable. Hash codes for identical strings can differ across .NET implementations, across .NET versions, and across .NET platforms (such as 32-bit and 64-bit) for a single version of .NET. In some cases, they can even differ by application domain. This implies that two subsequent runs of the same program may return different hash codes.

As a result, hash codes should never be used outside of the application domain in which they were created, they should never be used as key fields in a collection, and they should never be persisted.

In general following must apply:

  • If two objects are equal, the GetHashCode method must return the same value.
  • if two objects are not equal, the GetHashCode method does not have to return different values (but usually they are different, it's just not so important)
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
1

Hash code does not have to be unique. And in fact it even cannot be unique, in general.

Hash codes are used for optimization purposes. Less collisions means better performance, because objects can be quickly distinguished by their hash code. But even if two objects have the same hash code, but are different, they still can be distinguished, except in a less efficient way.

The most relevant place where hash codes are used are dictionaries and sets. I'm not exactly sure if Except or Distinct uses them. I suppose it is an implementation detail. Regardless, any self respecting implementation will work correctly regardless of what GetHashCode() returns. It may be slower though. So I suggest you try different variants and measure.

but we know that even if two strings contain the same value, their hash is computed by their reference, generating a different Hash.

That is not true. Reference is not taken into consideration when computing hash code. That would be insanely broken.

It's the exact opposite: two equal strings will generate the same hash code. Even if they are not reference equal. One of the important assumptions is that if two objects are equal, they should have the same hash code.

freakish
  • 54,167
  • 9
  • 132
  • 169