6

I am trying to get a unique hash value for a Java object, such as the following are true:

  1. If A == B then A.HashValue() == B.Hash.HashValue()
  2. If A != B then A.HashValue() != B.HashValue()

Lets say the object contains several boolean and integer fields.

Nishant
  • 54,584
  • 13
  • 112
  • 127
Gjorgji
  • 22,458
  • 10
  • 31
  • 39

6 Answers6

9

// Very important edit...

Gjorgji, I know you accepted the answer below as correct, but I have found it to be incorrect.

If you have a class like this:

class tiny {
    int a;
    public int hashCode() { return a; }
}

You have already maxed out all possible hash codes. (If this isn't clear why, please say so.)

So, if you add ANY more information to the object, if you want that information represented in the hashCode, you're going to have a collision somewhere.

But, for that matter, you don't really want to set out to get a hashCode that's 100% unique to an object. That's really not the point of hashCode!

The point of hashCode is to give you a "unique enough" identifier to the object so you can place it in a hash bucket. It's not for identification so much as it is for classification. The idea is if you have a whole bunch of objects, you're probably not going to have many collisions, so you are probably going to have pretty fast access to what you're looking for if you grouped items by their hashCode.

If this means you deselect my answer as correct, that's okay. It really isn't correct for what you're looking for. My hope is that you realize this explanation of hashCode leads you to the correct usage, thereby keeping the correctness. But as Mark clearly pointed out, this doesn't actually solve the issue you stated.

Below is the old answer:

===========================================================

A good article on it is found here, from Effective Java (hands down the best "I want to learn how to be a good Java developer" book out there.)

http://www.linuxtopia.org/online_books/programming_books/thinking_in_java/TIJ313_029.htm

class Gjorgji {
    boolean a;
    boolean b;
    boolean c;
    int x;
    int y;

    // EDIT: I almost forgot a VERY important rule...
    // WHEN YOU OVERRIDE hashCode, OVERRIDE EQUALS (and vice versa)
    public int equals(Object o) {
        if(!(o instanceof Gjorgji) return false;
        Gjorgji g = (Gjorgji)o;
        return a == g.a && b == g.b && c == g.c && x == g.x && y == g.y;

    }

    public int hashCode() {
        int hash = x ^ y;
        hash *= a ? 31 : 17; // pick some small primes
        hash *= b ? 13 : 19;
        hash *= c ? 11 : 29;
        return hash;
    }

}
Vivek
  • 11,938
  • 19
  • 92
  • 127
corsiKa
  • 81,495
  • 25
  • 153
  • 204
  • 2
    This does not guarantee that if `a.hashCode() == b.hashCode()` then `a == b`. – rlibby Feb 15 '11 at 05:26
  • @rlibby If you read the comments, OP said `I consider == as all fields of the objects are the same.` He's merely looking for a good way to fulfill the hashCode contract. – corsiKa Feb 15 '11 at 05:27
  • Haha. That's the answer? I was totally fooled by the question then :O +1 for being chosen. –  Feb 15 '11 at 05:40
  • If this is an acceptable answer, there is something seriously flawed with the question. But bravo for apparently being able to read minds :-). This does not guarantee that equal hashcodes represent equal objects (the second requirement). – Mark Peters Feb 15 '11 at 05:41
  • @Mark very good point. I broke a very good rule to follow, `When you override equals, override hashCode, and vice versa.` – corsiKa Feb 15 '11 at 05:45
  • @glowcoder: No, it's not about the hashCode contract...the OP stated he wanted his hash to return different values for any non-equal objects...this hash has collisions so it can't possibly respect that. – Mark Peters Feb 15 '11 at 05:47
  • 1
    @Mark Interesting. In that light, my 'answer' doesn't actually OP's question at all. It also points out the fact that once you have more possible states than possible hashcodes, you are guaranteed collisions for some inputs, and that happens as soon as you have `int a; boolean b;`. It looks like the best answer to this really is "OP needs to read up on hashCode to learn what it is for". – corsiKa Feb 15 '11 at 05:51
  • @glowcoder, the general term for this is the [Pigeonhole principle](http://en.wikipedia.org/wiki/Pigeonhole_principle). – rlibby Feb 15 '11 at 06:28
  • You also should add a method `public boolean equals(Object)` to be able to use this class in collections and such. This is the method which should be compatible to `.hashCode()`. – Paŭlo Ebermann Feb 15 '11 at 15:52
4

This is not possible in general, you must guarantee that if a.equals(b), then a.hashCode() == b.hashCode(). You cannot guarantee the reverse: you can always have collisions because the hashCode method only has a 32bit space and your JVM can have a 64bit space for identity hashcodes.

sjr
  • 9,769
  • 1
  • 25
  • 36
3

You can do this if you can limit the number of instances of your class to under 232. Here's one way:

import java.util.concurrent.atomic.AtomicInteger;

class UniqueHash {
    private static AtomicInteger NEXT_HASH_CODE = new AtomicInteger();
    private final int hashCode;

    UniqueHash() {
        while (true) {
            int nextHashCode = NEXT_HASH_CODE.get();
            if (nextHashCode == -1) {
                throw new RuntimeException("Too many instances!");
            }
            if (NEXT_HASH_CODE.compareAndSet(nextHashCode, nextHashCode + 1)) {
                hashCode = nextHashCode;
                break;
            }
        }
    }

    public int hashCode() {
        return hashCode;
    }
}

Edit 1: this was assuming that by "a == b" you meant a == b in the sense of object identity. You mention in the comments that you actually mean if the fields are equal. See the responses by @Mark Peters and @sjr.

Edit 2: fixed bug pointed out by @Tom Hawtin - tackline, left other bad practice in place. :)

Edit 3: there was a race in my "fix". Fixed the race.

Wolfson
  • 1,187
  • 17
  • 22
rlibby
  • 5,931
  • 20
  • 25
  • There's a bug in your code. :) And several pieces of bad practice. – Tom Hawtin - tackline Feb 15 '11 at 05:37
  • @Tom Hawtin, please elaborate. :) I would agree that this is a bad idea in general, but given my initial understanding of the problem (hash codes equivalent to object identity) I couldn't think of a better way than this. – rlibby Feb 15 '11 at 05:45
  • You should look at the code harder to see the bug. It would be no fun to just tell you. (It's nothing a short test would find.) – Tom Hawtin - tackline Feb 15 '11 at 05:49
  • @rlibby - Bad practice #1 - don't throw `RuntimeException`. – Stephen C Feb 15 '11 at 05:51
  • @Stephen C, thanks, I agree in general. If this were a real program, I might define a `TooManyInstancesException extends RuntimeException`, but the purpose of that line was merely to demonstrate the fundamental limitation of this technique to under 2^32 instances, and I didn't think defining a second class would make things any more clear. – rlibby Feb 15 '11 at 06:02
  • @Tom Hawtin, okay, there are two bugs. The first is that only the 2^32th instance will throw the exception. The second is that a derived class could break the contract by overriding `hashCode()`. I updated the program above to fix the first bug and left the second in place. – rlibby Feb 15 '11 at 06:12
  • Yup, I was looking for the only multiples of 2^32 throwing. And a nice cas-loop. – Tom Hawtin - tackline Feb 15 '11 at 15:16
3

I am trying to get a unique hash value for a Java object...Lets say the object contains several boolean and integer fields.

To go this you need longer than a 32-bit integer then, or you need to define constraints on the range of your fields. It's simply impossible to stuff more than 32 bits of information into 32 bits, and having an int and a boolean alone is 33 bits of info (assuming every value of the int is possible).

A long won't even be big enough if you have more than one int field. You'd need to go into a BigInteger, BitSet or a byte array.

Anyway, say your data doesn't span more than 32 bits. Then it's just a matter of arranging your data into a bit field represented by an int.

byte a;
byte b;
boolean c;
boolean d;

int hash = (a << 24) | (b << 16) | (c ? 0x02 : 0) | (d ? 0x01 : 0);


//layout
//index:                         ... 3210                             
//aaaa aaaa bbbb bbbb 0000 0000 0000 00cd

This doesn't make it a well-distributed hash (for using in a hash table, e.g.). However, if you want to guarantee uniqueness you probably aren't trying to use it for a hash table are you?

I'm curious why you have this bizarre requirement. The normal purpose of a hash is to get a value that is likely to be unique but of fixed (reduced) size. Your requirement guarantees that the hash needs to be as wide as the data it represents.

Mark Peters
  • 80,126
  • 17
  • 159
  • 190
2

Use System.identityHashCode()

http://download.oracle.com/javase/1.5.0/docs/api/java/lang/System.html#identityHashCode(java.lang.Object)

Edit: it's true that you can't guarantee uniqueness of hash codes with this method; however, I think it's the best you can do given that you can't get an Object's memory location. Any other hash function you come up with will necessarily have the property that two structurally equivalent Objects hash to the same value, whereas this function at least gives you a chance of all of the Objects your program creates having different hashcodes.

For completeness: an Object's default hash code is computed once, when the Object is constructed, from its initial memory location. So if more than one Object is created with the same initial memory location they will necessarily have the same hash code.

danben
  • 80,905
  • 18
  • 123
  • 145
  • Does this guarantee uniquess for each object (if the objects are not equal)? Im asking because System.HashCode() does not guarantee that: – Gjorgji Feb 15 '11 at 05:18
  • 1
    @Giorgio That does the same as `Object.hashCode()` *even if* it is overloaded. Note that two different objects *may* still have the same hash-code. However, if `a == b` then `identityHashCode(a) == identityHashCode(b)` independent upon any `hashCode/equals` overrides. –  Feb 15 '11 at 05:20
  • You can't rely on this, right? The identityHashCode is an int, JVM could be 64bit and so you can still have collisions. – sjr Feb 15 '11 at 05:22
  • This doesn't strictly guarantee what @Gjorgji is asking; these values may collide. Edit: I see this was pointed out to you while I was still writing. :) – rlibby Feb 15 '11 at 05:22
1

How to get a "unique ID" -- I do not really recommend this :-) However, it does fulfill the requirements in the question. See IdentityHashMap and consider weak references.

  1. Use a Map object -> integer where the integer represents a counter.
  2. For each new object seen, increment the counter and add it to the Map.
  3. For each existing object, return the value stored.

There may also be implementation-specific methods: for instance, on Sun, I believe Object.toString (base method) always returns a unique string per that objects lifetime. The "encoded number" could be siphoned out and is the "internal reference" AFAIK.

I make no guarantees about accuracy of the preceding paragraph. YMMV. Happy coding.

  • How do you know when you are seeing a new object? Map is going to use the object's `equals` method to determine whether it has been seen before. – danben Feb 15 '11 at 05:30
  • @danben I guess I didn't say so explicitly: [see IdentityHashMap](http://download.oracle.com/javase/1.4.2/docs/api/java/util/IdentityHashMap.html) there are also "weak" forms. –  Feb 15 '11 at 05:31