2

I am using a dictionary and I want to overwrite the GetHashCode function of the Key. But I need a 64-bit hashcode.

Is there any solution?

Soner Gönül
  • 97,193
  • 102
  • 206
  • 364
Masoud
  • 1,354
  • 6
  • 18
  • 30
  • 3
    No, [Object.GetHashCode](http://msdn.microsoft.com/library/system.object.gethashcode.aspx) can only return an `Int32`. But may I ask why you think you'd need an `Int64`? – Corak Sep 09 '13 at 11:01
  • Because of the nature of key. with 64-bit hashcode I have a good approach to hash it. but with 32-bit I experienced a lot of collisions. – Masoud Sep 09 '13 at 11:25
  • `GetHashCode` is only a helper for hash tables. It's not supposed to be unique. It will always have collisions, e.g. for types like `long`. – Joey Sep 09 '13 at 11:29
  • Then maybe the class(es) you use for the key have a `GetHashCode` implementation that could be improved to not have as many collisions. – Corak Sep 09 '13 at 11:29
  • It is almost impossible to improve the implementation of gethashcode because of the nature of the object. – Masoud Sep 09 '13 at 11:35
  • @Masoud You really need to give more information. What is the nature of the object that makes it impossible? – Oskar Kjellin Sep 09 '13 at 11:36
  • A chessboard is 8x8 right? That's 64 *numbers*. That would fit into a single byte with room to spare! An `Int32` is four bytes. That should be a world of more space you would ever need. `Int64` sounds like absolute overkill. - What is your key and what does your `GetHashCode` look like? – Corak Sep 09 '13 at 12:05
  • @Corak No, a full keyboard with pieces is something like 320 bits. 17 possible states for each space (6 pieces + king that can rook + pawn that can capture en passant = 8 * 2 colors = 16 + blank = 17) (so 4.something bits, rounded to 5). If you fully use all the bits it's 260 bits. Clearly it's compressible because >= 50% of the keyboard is empty, so the "empty" state is by most the more common. – xanatos Sep 09 '13 at 12:11
  • Take a look at the answer to a similar problem here: http://stackoverflow.com/a/18605833/1336590. A chess position is no different from row/column – Corak Sep 09 '13 at 12:12
  • @xanatos - Oh, thanks, didn't know that. But the "bits" confuse me ^_^. You need 3 bits for the x-axis, another 3 for the y-axis and another 5 for the 17 states (sadly not 16 states...). So you would have 11 bits, which would comfortably fit into 2 bytes (16 bit), or am I missing something? – Corak Sep 09 '13 at 12:29
  • @Corak No, it saves directly the pieces, without the coordinates. So it save (piece in A1, piece in A2, piece in A3... in B1, in B2...). The point is that if you do it at 64 bits the collisions are so much improbable that you can ignore them even if you have one, so you don't need the `Equals`. – xanatos Sep 09 '13 at 12:36

2 Answers2

6

No, you can't. The signature of GetHashCode is fixed as virtual int GetHashCode().

Note that Dictionary<TKey, TValue> do handle multiple items with the same hashcode. You can try it by overloading a GetHashCode like this:

public override GetHashCode()
{
    return 0;
}

That this will make the dictionary quite slow (it will make searching inside it O(n) instead of O(1))!

Dictionary<,> handles multiple objects with same key by looking at each one of the with the Equals method (so it's a two-step process, first GetHashCode, then Equals between all the items with the same GetHashCode).

To change a 64 bit GetHashCode to a 32 bit GetHashCode you can simply:

long hash64 = ....;
int hash32 = hash64.GetHashCode();
return hash32;

:-)

or, if you prefer the long way:

long hash64 = ....;

unchecked
{
    int hash32 = ((int)hash64) ^ ((int)(hash64 >> 32));
    return hash32;
}

If you are interested, here it's explained how Dictionary<,> works internally. Look under The System.Collections.Generic.Dictionary Class

I have done some research on Zobrist hashes... It seems that you should simply ignore the chances of collisions at 64 bits. If you want to simulate this, you could do something like:

public class HashPiece
{
    public readonly long Hash64;
    public readonly Piece[] Board = new Piece[64];

    public int GetHashCode()
    {
         return Hash64.GetHashCode();
    }

    public bool Equals(object other)
    {
        return this.Hash64 == ((HashPiece)other).Hash64;
    }
}

In this example you don't compare the Piece[] array, and you just hope the full 64 bit hash will be right. Clearly another solution is:

    public bool Equals(object other)
    {
        HashPiece other2 = (HashPiece)other;

        if (this.Hash64 != other2.Hash64)
        {
            return false;
        }

        return this.Board.SequenceEqual(other.Board);
    }

Note that I've found anecdotical experience that the quality of the random number generator, and the single value of the seed value used, can influence the number of collisions.

xanatos
  • 109,618
  • 12
  • 197
  • 280
  • The problem is that with 32 bit hashcodes the collision is high. So 32-bit hash-code isn't good enough. – Masoud Sep 09 '13 at 11:24
  • @Masoud If your problem is that with 32 bits the collision is high then probably the problem is that your GetHashCode is poor. Probably you are wasting bits of hashcode, but to check this we have to see what you want to hash. – xanatos Sep 09 '13 at 11:27
  • But with `int hash32 = hash64.GetHashCode();` you are actually "compressing" the 64bit hash again in a 32bit hash. So i think that the collision probability is almost the same. – Alberto Sep 09 '13 at 11:32
  • +1 for "first `GetHashCode`, then `Equals`" - It's expected that there might be a lot of collisions with hash codes. But they're only the first step in narrowing down the potentially equal objects. So the next step ist to really check for equality. Both `GetHashCode` and `Equals` should be implemented to be rather fast. – Corak Sep 09 '13 at 11:32
  • it is a chess position. And I used the best approach zorbisthash. there is no equality check. so the collision should almost never happen. – Masoud Sep 09 '13 at 11:39
  • A chess position? That must be incredibly fast. I think you need to show your code – Oskar Kjellin Sep 09 '13 at 11:40
  • You could also just use a 2d array – Oskar Kjellin Sep 09 '13 at 11:43
  • 1
    @Masoud I'll say that you chose bad random numbers for Zobrist hash of the single pieces. – xanatos Sep 09 '13 at 11:48
  • It may be the reason. – Masoud Sep 09 '13 at 11:49
  • @Masoud See expanded response – xanatos Sep 09 '13 at 12:46
1

I have used the below code to generate 64 Bit HashCode,mostly as a substitute for long repetitive strings.

 public long ComputeHash(String p_sInput)
 {
        MD5CryptoServiceProvider md5 = new MD5CryptoServiceProvider();
        byte[] hash = md5.ComputeHash(Encoding.ASCII.GetBytes(p_sInput));
        return BitConverter.ToInt64(hash, 0);
 }
nobody
  • 10,892
  • 8
  • 45
  • 63