0

I have objects with the following properties:

class MyObject
{
    int sourceId();
    String id();
}

If I use id as the identifier, there could be collisions as there may be records with the same id but different sourceId

Therefore I'm looking into generating a hash of sourceId and id and using that to generate unique ids for each record. I was thinking of just md5ing String.valueOf(sourceId + id), but it seems that md5 collisions are not as uncommon as I'd like.

Which other algorithm would be recommended for this, something which produces a fast hash, and where it'd also be very improbable for a collision to occur?

Ali
  • 261,656
  • 265
  • 575
  • 769
  • Would UUID help in this case? The probability for collisions would be very small, but you also would have to accept a fix length for the id as UUID cant be trimmed down. – hamena314 May 21 '15 at 08:03
  • SHA1, SHA256, SHA512, many of them – thinker May 21 '15 at 08:03
  • @hamena314 It can't use UUID as i also need the ability to reverse-generate the id from `sourceId` and `id` – Ali May 21 '15 at 08:04
  • Only way to generate "hash" (it wouldn't really be hash in the usual meaning of the word), which would not reveal original values, but from which you can recover them, would be to serialize the data, then encrypt the data blob with a secret key. I mean, what you seem to want is, by definition, encryption. – hyde Dec 13 '15 at 22:17

2 Answers2

3

If the id() String has a fixed length, you can simply concatenate the sourceId and the Id :

public String getUniqueID ()
{
    return sourceID() + id();
}

If id() doesn't have a fixed length, you can pad it with zeroes (for example) to obtain a fixed length and then concatenate it to sourceID() as before.

Eran
  • 387,369
  • 54
  • 702
  • 768
  • This is a good solution, but I would still prefer to hash them, since that won't expose the underlying values in URLs and such. Any recommendation for the hash algorithm? – Ali May 21 '15 at 08:53
  • @ClickUpvote It depends on the amount of "security" you require. If it's enough to make the unique ID less obvious to the naked eye, you can do something simple. For example - if id() contains only numeric characters, you can make the underlying values less obvious by encoding the unique ID in some higher base (Hexadecimal or even base 64). If you require better security, you'll have to use some one way hashing algorithm, so you wouldn't be able to recover the underlying values either. Is that what you want? – Eran May 21 '15 at 09:13
  • It doesn't have to be unbreakable. But I am concerned about collisions. I don't want two different id/sourceId combos to ever produce the same hash. – Ali May 21 '15 at 10:01
1

Assuming this value can be a String, I'd just concatenate both values with a hyphen:

class MyObject
{
    int sourceId;
    String id;
    String getUniqueKey() {
        return sourceId+"-"+id;
    }
}

Then you can obtain the original values using value.split("-");

Pablo Lozano
  • 10,122
  • 2
  • 38
  • 59
  • The problem with that is, there may be hyphens in the `String id()` which may cause conflicts. So I think a hashing algorithm is necessary. – Ali May 21 '15 at 08:51
  • 1
    If the number (sourceId) is placed as the first part, then the first hiphen will be the separator, any other will be part of the "id" value. Anyway, if you don't want to expose that value, this is not the solution you are looking for, unless you ofuscate it(I'm thinking of Base64) – Pablo Lozano May 21 '15 at 08:59