0

Since there is known fact that Java generates around 4 Billion unique Hashcodes.

I am using Hashcode of Some String (Example Fname + Lname + DOB + DATE) which becomes Primary Key of my Database

in @PrePersist I set it with Hashcode which helps me in generating Hashcode for new Users. (Which has to be unique).

Now I am running out of has codes. Possible alternative for me is to use SHA-2 , MD5 etc.

How can I increase size of hash code & yet avoid that big collisions.

Ashish
  • 1,856
  • 18
  • 30
  • 1
    Couldn't you just define a method similar in functionality to `hashCode()` with a `long` or `BigInteger` return type? – Logan Mar 16 '17 at 15:33
  • 1
    Do not use the hash code as the primary key in the database. Hash codes are not unique and are not suitable to be used as a unique identifier. Using a different hash algorithm will not help, because those hashes are also in principle not unique (although in practice hash collisions should be rare). Using the hash code as the primary key is a fundamental mistake in the design of your system. – Jesper Mar 16 '17 at 15:34
  • What happens when you have 2 users that both have the same first and last name and date of birth that register on the same day? I know that scenario is probably rare but what happens when it does? – dstarh Mar 16 '17 at 15:39
  • Hashcode in my case is composition of fields which makes that tuple unique which then becomes my primary key. For my case it acts like checksum + uniqueness & primary key – Ashish Mar 16 '17 at 15:40
  • 1
    @Ashish thats all well and good but those fields in the real world can't be guaranteed to be unique – dstarh Mar 16 '17 at 15:41
  • if you REALLY want them to be unique, make a unique index in your database and use whatever normal autogenerated primary key your database provides – dstarh Mar 16 '17 at 15:42

2 Answers2

2

If your goal is to create a unique identifier for the database, I would suggest using UUID.

UUID Version 3, as it uses a namespace, will fit your case.

Some databases have native support for UUID, for instance PostgreSQL

Gonzalo Matheu
  • 8,984
  • 5
  • 35
  • 58
  • Hashcode in my case is composition of fields which makes that tuple unique which then becomes my primary key. For my case it acts like checksum + uniqueness & primary key. – Ashish Mar 16 '17 at 15:37
  • 2
    @Ashish, that is _still_ a terrible idea. Hash codes should never be assumed to be unique. _Stop using hash codes as a primary key._ – Louis Wasserman Mar 16 '17 at 16:18
1

I think you are confused about using int Object.hashCode(), which you can override and which returns an int and using a secure hash function. Those are two things. Object.hashCode is not intended to return unique integers (returning 1 is a valid implementation). So, using String.hashCode() for object identity is not a great idea since it can and will have collisions. It's intended for use with e.g. HashTables; which means it is optimized for performance and not for avoiding collisions.

You can indeed use sha1, sha2, sha3, or md5 if you want some kind of content hash. If not, use SecureRandom or UUID to generate something random. All of these have a very low probability of ever giving you a collision (not completely 0 of course).

Jilles van Gurp
  • 7,927
  • 4
  • 38
  • 46