17

How do I generate a unique ID value that can be easily passed on via phone or email, that can be easily remembered while still not being easily guessable.

I am using database. But as I am giving away the ID to people I do not want it to be bound to a database. I could do something with the unique ID I already have in database, but cannot use it directly, to avoid it being guessable.

I am using Python and have tried using uuid, but uuid is too long to be human readable.

Is there any way to create a human friendly pronounceable ID?

iamgopal
  • 8,806
  • 6
  • 38
  • 52
  • Unique to which base? Do you have a database or do you want to create a random string/number which has not likely a collision with the following values? Please tell more about your problem. – schlamar Jul 20 '11 at 12:33
  • What about `hash(str(your_id))` – schlamar Jul 20 '11 at 12:37
  • 1
    You can strip the hash values but that will cause a collision more likely. But thats problem you have to deal with. No collision -> not readable, readable -> collisions possible. – schlamar Jul 20 '11 at 12:43

5 Answers5

13

What you want to do is stitch together syllables to create pronounceable pseudo words. You can create syllables in any language you like to make up words that can be pronounced and communicated but don't actually mean anything.

Here is an article about how one person created human readable UIDs for speaking them phonetically and some of the pitfalls.

Read the above link for just some of the pitfalls you should consider when taking an approach like this.

You could just use a string of alphabetic letters but present them as the NATO phonetic alphabet instead of just the alphabet.

3

For emails, what I use is:

from base64 import b64encode
from os import urandom
key = b64encode(urandom(9))

You can increase/decrease the length by changing the number. Sometimes you will get + and / characters and you can strip them out if you like.

Edit: Since you also want to pass them over the phone maybe b32encode(urandom(5)) would be a better choice since it wont give you any lowercase or unusual characters.

nima
  • 41
  • 2
2

How about something like Amazon's payphrases? Convert the binary ID to a sequence of english words.

If you want something with the same range as a UUID, you need to represent 16 bytes. To keep it reasonable, restrict the phrase to 4 words, so each word represents 4 bytes, or 65536 possibilities, so you'll need a dictionary of 262,144 words.

EDIT: Actually on reflection what might be better is a sort of mad lib sentence - it will restrict the number of needed words and may make it easier to remember since it has a grammatical structure. It will need to be longer, of course, perhaps something like this:

(a/an/the/#) (adj) (noun) (verb)(tense) (adverb) while (a/an/the/#) (adj) (noun) (verb) (adverb).

so12311
  • 4,179
  • 1
  • 29
  • 37
  • Or of 65536 words, if you allow "Foo Foo Foo Foo" as a valid payphrase. – agf Jul 20 '11 at 12:57
  • Your edit is too complicated. Just loading a dictionary and using `memorable_id = ''.join(dictionary[random.randint(0, 65535)] for Null in range(4))` is better. – agf Jul 20 '11 at 13:24
0

Here's a uuid-based example. Adjust the 1000000 to increase or decrease the range of your ids. Since you're reducing the range of the id, you'll probably have to check to see if the ID already exists.

>>> import uuid
>>> hash(str(uuid.uuid1())) % 1000000
380539
>>> hash(str(uuid.uuid1())) % 1000000
411563
robert
  • 33,242
  • 8
  • 53
  • 74
  • 2
    Randomness has nothing to do with uniqueness. –  Jul 20 '11 at 12:48
  • 1
    Using a random number is completely equivalent to using a hash of equal size, if you don't need to reproduce the same ID again later for the same object. – agf Jul 20 '11 at 12:55
0

Sure, but it requires a few more restrictions on your problem space, namely:

  1. There is only one thing generating unique IDs
  2. Your items have some concept of a title
  3. You can persist a list of strings

Then you'd do something like:

_UID_INTERNALS = set()

def getID(obj):
    if hasattr(obj, 'UID'):
        return obj.UID
    title = obj.title.encode("ascii", errors="ignore")
    title = title.lower()
    title = "-".join(title.split())
    if not title:
        title = "unnamed-object"
    UID = title
    num = 1
    while UID in _UID_INTERNALS:
        UID = title + str(num)
        num += 1
    _UID_INTERNALS.add(UID)
    obj.UID = UID
    return UID
MatthewWilkes
  • 1,048
  • 8
  • 15
  • As above, what's wrong with this code? It works perfectly adequately and solves your question as stated. – MatthewWilkes Jul 25 '11 at 10:15
  • If the OP is currently using UUID, then it's almost certain that there will be more than one thing generating unique IDs as it is almost certainly a distributed system. Frankly, if you can generate the IDs centrally the problem is trivial to solve. – Ian Goldby Apr 30 '21 at 09:45
  • OP is using uuid because of resistance to guessing. In fact, as they say that they already have a database entry with a unique ID it's quite *unlikely* that the problem space is a distributed system. The problem isnt generating a unique ID, per se, it's choosing a scheme that minimises error when spoken and doesn't allow enumerstion. One example is news items on a website, where they might have an internal ID but that leaks info about unpublished items. Fundamentally, the issue is that this question is ambiguous. – MatthewWilkes May 01 '21 at 10:40