0

I know now a days inbuilt utilities available like HashCodeBuilder from Apache commons lang but i was trying to understand how to implement it myself and came across the example of hascode function for Employee class at http://en.wikipedia.org/wiki/Java_hashCode()

Everywhere on google, same kind of technique is suggested like multiplying non zero value with odd prime number and then summing it with instance variable(do it for instance variable).

Questions:-

1)why we can't return employeeId as hascode becoz it will aways be unique. Its simple and serves the hascode purpose. Agreed if it is not unique probably we need that kind of technique. Is that right?

2)Even If employee id is not unique, why its suggested to multiply with odd prime number? why taking any damn integer is not considered good?

Update:-

Peter i ran the example you mentioned it printed

[0, 32, 64, 96, 128, 160, 192, 224, 288, 256, 352, 320, 384]

[0, 32, 64, 96, 128, 160, 192, 224, 288, 256, 352, 320, 384]

i assume that output for now as yoy expected to understand the concept as you mentioned in your answer

[373, 343, 305, 275, 239, 205, 171, 137, 102, 68, 34, 0]

[0, 34, 68, 102, 137, 171, 205, 239, 275, 305, 343, 373]

As you suggested in your comment that this example demonstrated even unique hashcode can end up in same bucket. How this example demonstrated this behaviour? Do you mean 373 for integers and 0 for integers2 end up in same bucket ?

How prime number is helping in this example and how 34 would not have helped?

M Sach
  • 33,416
  • 76
  • 221
  • 314

1 Answers1

2

why we can't return employeeId as hascode becoz it will aways be unique. Its simple and serves the hascode purpose. Agreed if it is not unique probably we need that kind of technique. Is that right?

It's uniqueness is not important. Multiplying by a prime is a good way of merging multiple fields into one hashCode, but it sounds like you only have one, so it wont make much difference.

Even If employee id is not unique, why its suggested to multiply with odd prime number? why taking any damn integer is not considered good?

If you multiply by an even number what will the lowest bit of the hashCode be? How random/useful is it?


Note: every hashCode() for an Integer is unique, but get the right combination of integer values and when they are reduced to the capacity of a HashMap, they actually map to the same bucket. In this example, the entries appear in the reverse order they were added because every entry mapped to the same bucket.

HashSet<Integer> integers = new HashSet<>();
for (int i = 0; i <= 400; i++)
    if ((hash(i) & 0x1f) == 0)
        integers.add(i);
HashSet<Integer> integers2 = new HashSet<>();
for (int i = 400; i >= 0; i--)
    if ((hash(i) & 0x1f) == 0)
        integers2.add(i);
System.out.println(integers);
System.out.println(integers2);


static int hash(int h) {
    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

prints

[373, 343, 305, 275, 239, 205, 171, 137, 102, 68, 34, 0]
[0, 34, 68, 102, 137, 171, 205, 239, 275, 305, 343, 373]
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • "If you multiply by an even number what will the lowest bit of the hashCode be?" How random/useful is it? Can you elaborate on it. Thanks in advance :) – M Sach Oct 06 '13 at 13:52
  • Even if we have multiple fields, Why multiplying by a prime is a good way of merging multiple fields into one hashCode? Why not only empId? i think answer lies in its uniqueness is not important. I didn't get what does it mean? As per undersranding uniqueness is important so that its get stored under separate bucket for unequal objects for hashbased data structures? – M Sach Oct 06 '13 at 13:56
  • If you multiply by an even number, you get an even number so the lowest bit is always 0 and of no value. – Peter Lawrey Oct 06 '13 at 13:59
  • You want a randomised hashCode and a prime number is the least likely to get some pattern which means not all values are equal. Image you use 9 and you use a Hashtable which is a multiple of 3 in size. This will result in 2 out of every 3 buckets not being used, or worse possible only one in 9 buckets being used. The smaller the prime factors, the more likely you will have some poorly performing situation – Peter Lawrey Oct 06 '13 at 14:03
  • Multiplying one value by a prime which not make it any more random. – Peter Lawrey Oct 06 '13 at 14:03
  • The uniqueness of your fields is not something you can change in the hashCode(), but you can avoid making it worse. BTW keys have to be unique. – Peter Lawrey Oct 06 '13 at 14:04
  • Even if the hashCode is unique, you don't have 2^32 buckets. This means the hashCode has to be reduced to a smaller set of values. The default size for HashMap is 16, so just 4 bits will be used (so clearly an even hashCode is a bad idea) – Peter Lawrey Oct 06 '13 at 14:06
  • @MSach Added an example of why unique hashCodes is not enough to avoid a hash collection degrading. – Peter Lawrey Oct 06 '13 at 14:16
  • Peter could you point me to some article so that i can understand all stuff in detail you are trying to convey. My understanding its always better to return unique hascode for unequal objects so that they can go to separate bucket(different index in array) which will provide good performance while retrieval(as we can get the striaghtaway array index position from hascode). If we retuen same hascode for unequal object, objects will go at same index of array(so internally it will be linkedlist at that position so bad performance while retrieval) – M Sach Oct 06 '13 at 14:27
  • @MSach That is correct, but as I demonstrated, even unique hashCodes can end up in the same bucket. – Peter Lawrey Oct 06 '13 at 14:33
  • So you mean to say that to avoid the case where unique hashCodes can end up in the same bucket, we use multiply with prime number technique – M Sach Oct 06 '13 at 14:38
  • You are right that it helps. You are less likely to get a pattern of unique values like the above. eg. 34 wouldn't have been a good choice ;) – Peter Lawrey Oct 06 '13 at 14:40
  • @MSach My mistake, I should have include the code for hash() http://vanillajava.blogspot.co.uk/2013/10/unique-hashcodes-is-not-enough-to-avoid.html – Peter Lawrey Oct 06 '13 at 15:29