Why the GetHashCode does not take advantage of SK.exe tool's hashcode algorithm?

Question

MSDN says:

"The default implementation of the GetHashCode method does not guarantee unique return values for different objects. "

But on the other hand, when I use the sn.exe tool it ensures a unique hash value to create a strongly-named assembly. If I did not get the point wrong, all the content of the assembly is converted to a hash value.

So, why GetHashCode()'s default implementation does not use the same algorithm used by sn.exe to create a unique hash values for objects and expects the developer to implent it?

Do you mean [sn.exe](http://msdn.microsoft.com/en-us/library/k5b5tt23.aspx)? — dtb, Feb 26 '12 at 14:58
That does not make sense at all. An `int` can't guarantee uniqueness, it's *far* too short. And a cryptographic hash is much more expensive than what you want for `GetHashCode()`. And `GetHashCode()` must match `Equals`. So there is no way around overriding it when you override `Equals`. — CodesInChaos, Feb 26 '12 at 15:07
What in the world is SK.exe? Even if you meant Sn.exe whats the relation to GetHashCode? — bobbyalex, Feb 26 '12 at 15:55
@Bobby: Thanks; I have corrected the typo. On the other hand, I did not say there is a relationship in between. Henk and Hans seems to get my point. — pencilCake, Feb 26 '12 at 16:11

H H · Answer 1 · 2012-02-26T15:27:06.947

Those are two entirely different things.

The GetHashCode() function by definition returns (only) a 32 bits integer. It is supposed to use a fast algorithm and does not (can not) guarantee uniqueness. A PC can quickly generate enough strings to show a collision.

When you sign an application (document) you will end up with a lot larger hash (like 128 or 256 bits). While in theory you might still have a collision this has no practical implications.

Hans Passant · Accepted Answer · 2012-02-26T15:49:57.077

Not enough bits. GetHashCode() returns 32 of them so there can never be more than 4 billion distinct values. The birthday paradox cuts that down considerably. The strong name generated by sn.exe (not sk.exe) uses a SHA1 hash. Which returns 160 bits, allowing for 2^160 distinct values.

Which is a Really Big Number (1.4E48), ensuring uniqueness by the sheer quantity. Somewhat similar to a Guid which uses 128 bits. Not the same, a Guid generator ensures that no duplicates can occur, SHA1 has no such guarantee.

GetHashCode has a limited number of bits because the primary requirement for the method is that it is fast. Short from providing the bucket index for hashed collections, its use is making equality testing fast. GetHashCode needs to be an order of magnitude faster than Equals(), give or take, to make it useful. That requires many corners to be cut, typically, the GetHashCode implementation for a struct that contains reference types for example only returns the GetHashCode value of the first member.

score 1 · Answer 3 · answered Jun 16 '13 at 02:24

There's no limit to the number of objects a program can create, call GetHashCode() upon, and abandon. There is, however, a limit of 4,294,967,296 different values GetHashCode() can return. If a program happens to call GetHashCode 4,294,967,297 times, at least one of those calls would have to return a value that had already been returned previously.

It would theoretically be possible for the system to keep a pool of hash-code values, and for objects which are abandoned to have their hash codes put back in the pool so that GetHashCode() could guarantee that it will never return the same value as any other live object (assuming there are no more than 4,294,967,296 live objects, at least). On the other hand, keeping such information would be expensive and not really offer much benefit. From a practical perspective, it's just as good to have the system generate an arbitrary number either when an object is constructed or the first time GetHashCode() is called upon it. There will be occasional collisions, but generally not enough to bother well-written code.

BTW, I've sometimes thought it would be useful for each object to have a 64-bit ID which would be guaranteed unique, and which would also rank objects in order of creation. A 64-bit ID would never overflow within the lifetime of any foreseeable program, and being able to assign objects a ranking could be helpful in some caching or interning scenarios. For example, if a program generates some large objects by reading data from files, and frequently scans them to find differences, it may often find objects that contain identical data but are distinct. If two distinct objects are found to be identical and interchangeable, replacing reference to the newer one with the older one may considerably expedite future comparisons among them; if many matching objects are compared among each other, many of the references to newer objects will get replaced with references to the oldest ones, without having to explicitly cache anything. Absent some means of determining "age", however, such an approaches wouldn't really work, since there would be no way to know which reference should be abandoned in favor of the other.

score 0 · Answer 4 · answered Feb 26 '12 at 15:42

0

Unrelated. Wonder how you could relate these two!!

Still, to add more argument:

Hashcode for a value 'can not guarantee' uniqueness for different values. But it does 'guarantee' a same hash code for a given value/object!. That means:

var hashOne = "SO".GetHashCode();
var hastTwo = "SO".GetHashCode();
Debug.Assert(hashOne==hashTwo); //The assertion would succeed.

SN just just generates a random unique number, with no logic over an instance.

answered Feb 26 '12 at 15:42

Manish Basantani

16,931
22
71
103

`SN just just generates a random unique number, with no logic over an instance` are you sure about that? All cryptographic hash functions takes a byte[] as input which can be thought of `instance` – L.B Feb 26 '12 at 16:08

Why the GetHashCode does not take advantage of SK.exe tool's hashcode algorithm?

4 Answers4