I've been musing about this for some time: how exactly is Object.GetHashCode
implemented in the CLR or Java? The contract for this method is that if it is called on the same object instance, it should always return the same value.
Note that I'm talking about the default implementation of GetHashCode(). Derived classes are not required to override this method. If they choose not to do so, they will in essence have reference semantics: equality equals "pointer equality" by default when used in hash tables &c. This means that somehow, the runtime has to provide a constant hashcode for the object throughout its lifetime.
If the machine I'm running on is 32-bit, and if the object instance never moved in memory, one could theoretically return the address of object, reinterpreted as Int32. That would be nice since all distinct objects have distinct addresses and therefore would have different hash codes.
However, this approach is flawed, amongst other things because:
if the garbage collector moves the object in memory, its address changes, and so would its hash code in violation of the contract that the hashcode should be the same for the lifetime of the object.
On a 64-bit system, the object's address is too wide to fit into Int32.
Because managed objects tend to be aligned to some even power of 2, the bottommost bits will always be zero. This may cause bad distribution patterns when the hash codes are used to index into a hash table.
In .NET, a System.Object
consists of a sync block and a type handle and nothing more, so the hashcode cannot be cached in the instance itself. Somehow the runtime is able to provide a persistent hashcode. How? And how do Java, Mono, and other runtimes do this?