3

I came upon an interesting problem today, and scoured the internet looking for a solution but didn't find any. The problem is this:

A user creates an account and he is given a unique ID number, say 123, to represent his account. When another user creates an account, I could just add 1 to the most recently created ID number and assign it to him, 124. However, this doesn't completely anonymize everybody as he now knows that user 123 registered before him. A very small problem to have, but in some conceivable situations this could cause much larger problems.

A better solution would be to have IDs random but unique so that nobody can tell who came first.

To solve this problem, one might use a standard hash function or random number generator to create a unique ID for each person, but then you run into the possibility of collisions. This can be avoided by checking for collisions and running again, but let's say for this example that this will slow down the system too much. Or it could be that the generator is running on incomplete information and cannot check to see if there are collisions.

A different idea that I came up with is to basically have a shuffled deck of cards that you store and take one off the top any time you need a new ID. When you run out of cards in the deck, you take a new deck continuing at the highest card in your last deck and shuffle that one. Downsides to this are that you must store this deck of cards and if you somehow accidentally lose the deck, you run into many problems trying to recreate it or continuing on without it.

A very similar solution to this one is to recreate this shuffled deck based off a fixed seed every time, and take the nth card of the deck instead of the top one. The problem that this has is it can be expensive to shuffle this deck every time you need a new card.

Other mathematical models that I have tried to come up with all have the problem of the next number in the sequence being predictable (each number is a fixed amount apart from the previous one). A lot of them have the problem of having collisions as well.

So my question is: Is there some mathematical model I can plug numbers into to get unique IDs that doesn't require the use of a "deck" (read: array) stored in memory or recomputed on every function call.

For example

randomID(number, seed, range)
randomID(1,123,1000) = 284
randomID(2,123,1000) = 739
randomId(3,123,1000) = 088
randomId(3,888,1000) = 912

I have looked up https://code.google.com/p/smhasher/wiki/MurmurHash3 which seems to be promising, but I don't think it applies over an arbitrary range of numbers, and only over 32bits or 64bits.

Glen Takahashi
  • 861
  • 1
  • 10
  • 19
  • 4
    Congratulations! you have just came up with GUID: http://stackoverflow.com/questions/371762/what-exactly-is-guid-why-and-where-i-should-use-it – trailmax Jul 21 '14 at 23:20
  • Not sure why trailmax's answer is a comment, but it's a good answer. Most languages have a library to generate a GUID. The value is not guaranteed to be unique but the odds of collision are astronomically small, that for all practical purposes, they work as unique, non sequential ids. – Yevgeniy Brikman Jul 22 '14 at 03:03

4 Answers4

2

You can use a block cipher to achieve this. When you encrypt a block (a fixed number of bits), the cypher maps it to a different block with the same number of bits. The decryption step undoes this. No two different blocks are ever mapped to the same block.

So take your user id of let's say 64 bits and encrypt it with a 64 bit block cipher and a secret key, and you have your randomized user id. To get the original user id back, just decrypt with the same key.

If you use a well-known algorithm like Blowfish or AES, the results will be cryptographically as secure as you can get.

HugoRune
  • 13,157
  • 7
  • 69
  • 144
1

Not sure exactly how you would store this but you could create a large array that is big enough to handle all the users that would be using your site. Then you could create a random number that starts at a random nth index and iterates a random amount of times. When you fall on an index that is empty you put a value(such as 1 or whatever) in that index and the user would get the id of the index. If that index already has a value then repeat the process until the random number falls on an index. The nice thing about this would be you wouldn't even have to iterate because you could just add the random number to the current index. The only logic would some sort of mod function to handle cases where you reach the end of the array. Hope this helps.

Maxqueue
  • 2,194
  • 2
  • 23
  • 55
1

You could select a pseudorandom number generator with a period larger than the maximum number of users you would ever need to support, then you just need to seed the PRNG with the last-used value in order to generate the next. If you somehow lose track of the last-used value, you can use the initial seed then generate further values based on the number of already registered users. You'd probably want to avoid PRNG with excessively large values (e.g. perhaps find a 16 bit one 2^16 period if you'll have less than 65536 users), so the numbers are practical to remember.

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252
1

Here is approach that is flexible and efficient :-

  1. Maintain a hash table.
  2. Select a number M which is proportional to hash table size you need to use.
  3. Generate M random numbers for first M ids and prevent collisions by hash table lookup.
  4. At end of M generations add all id+1 values of M previous ids if they are unused to array of size M+1.
  5. Add id 0 if it is not used earlier.
  6. For every next id generation select a id from array at random.
  7. Add id+1 if it is not in hash table.

Advantages :-

  1. You can regulate the randomness and storage used using M. Higher the M more random your ids are. You might find a trade off between space used and randomness.
  2. you can easily use in-memory database like redis for hash table and array.
  3. The time complexity for generation of unique id is O(1)
Vikram Bhat
  • 6,106
  • 3
  • 20
  • 19