10

What would be a good way to abbreviate UUID for use in a button in a user interface when the id is all we know about the target?

GitHub seems to abbreviate commit ids by taking 7 characters from the beginning. For example b1310ce6bc3cc932ce5cdbe552712b5a3bdcb9e5 would appear in a button as b1310ce. While not perfect this shorter version is sufficient to look unique in the context where it is displayed. I'm looking for a similar solution that would work for UUIDs. I'm wondering is some part of the UUID is more random than another.

The most straight forward option would be splitting at dash and using the first part. The UUID 42e9992a-8324-471d-b7f3-109f6c7df99d would then be abbreviated as 42e9992a. All of the solutions I can come up with seem equally arbitrary. Perhaps there is some outside the box user interface design solution that I didn't think of.

cyberixae
  • 843
  • 5
  • 15
  • Note that this can be turned to Base62 - it will become smaller length (20-21) than hex (36). Though the length may vary. Maybe it would become short enough you won't need to abbreviate? And if you have only 1 node or multiple nodes that aren't under high stress you may also remove some of the bits from the UUIUD which could get you to mm.. 12 letters. – Stanislav Bashkyrtsev May 27 '20 at 10:29

3 Answers3

6

Entropy of a UUID is highest in the first bits for UUID V1 and V2, and evenly distributed for V3, V4 and V5. So, the first N characters are no worse than any other N characters subset.

For N=8, i.e. the group before the first dash, the odds of there being a collision within a list you could reasonably display within a single GUI screen is vanishingly small.

StephenS
  • 1,813
  • 13
  • 19
  • UUIDs contain version (in the 13th character) and variant (in the 17th character) bits so the claim: "the first N characters are no worse than any other N characters subset" is not completely true. The version is 4 bits so the 13th character is always constant and the variant contains 2-3 bits so the 17th character is only partially random. – vkopio Feb 25 '22 at 13:23
  • 1
    @vkopio Right, which means the first 8 characters are *better* than the 13th and 17th characters. “Better” falls under “not worse”. – StephenS Feb 25 '22 at 13:56
2

The question is whether you want to show part of the UUID or only ensure that unique strings are presented as shorter unique strings. If you want to focus on the latter, which appears to be the goal you are suggesting in your opening paragraph:

(...) While not perfect this shorter version is sufficient to look unique in the context where it is displayed. (...)

you can make use of hashing.

Hashing:

Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string. Hashing is used to index and retrieve items in a database because it is faster to find the item using the shorter hashed key than to find it using the original value.

Hashing is very common and easy to use across many of popular languages; simple approach in Python:

import hashlib
import uuid
encoded_str = uuid.UUID('42e9992a-8324-471d-b7f3-109f6c7df99d').bytes
hash_uuid = hashlib.sha1(encoded_str).hexdigest()
hash_uuid[:10]
'b6e2a1c885'

Expectedly, a small change in string will result in a different string correctly showing uniqueness.

# Second digit is replaced with 3, rest of the string remains untouched 
encoded_str_two = uuid.UUID('43e9992a-8324-471d-b7f3-109f6c7df99d').bytes
hash_uuid_two = hashlib.sha1(encoded_str_two).hexdigest()
hash_uuid_two[:10]
'406ec3f5ae'
Olivier Grégoire
  • 33,839
  • 23
  • 96
  • 137
Konrad
  • 17,740
  • 16
  • 106
  • 167
  • If you use Python, why don't you use the correct tools at your disposal? `encoded_str = uuid.UUID('42e9992a-8324-471d-b7f3-109f6c7df99d').bytes` – Olivier Grégoire May 27 '20 at 10:44
  • @OlivierGrégoire Thanks for you comment. After using `uuid` the `encoded_str` becomes `b'B\xe9\x99*\x83$G\x1d\xb7\xf3\x10\x9fl}\xf9\x9d'` which, IMHO, makes example more difficult to follow. I wanted to illustrate an idea not suggest specific implementation. In actual solution OP should consider other aspects, like choice of hashing algorithm, etc. but that conversation would be more applicable to a question about specific implementation, I reckon. – Konrad May 27 '20 at 10:52
  • 2
    I meant as the first step. The full steps should be (in a better written form): `hashlib.sha1(uuid.UUID('42e9992a-8324-471d-b7f3-109f6c7df99d').bytes).hexdigest()[:10]` – Olivier Grégoire May 27 '20 at 10:56
  • 1
    @OlivierGrégoire I see, no problem with that suggestion at all. Please feel at liberty to edit the answer. – Konrad May 27 '20 at 10:57
  • Being able to associate the button with the UUID seems more important, so I guess I'm forced to choose some part of the original UUID. Would still be nice to avoid the worst case scenario where all UUIDs are abbreviated to the same hex value. I could perhaps use a QR code if using special software to decipher the value were possible. – cyberixae May 27 '20 at 12:51
0

After thinking about this for a while I realised that the short git commit hash is used as part of command line commands. Since this requirement does not exist for UUIDs and graphical user interfaces I simply decided to use ellipsis for the abbreviation. Like so 42e9992...

cyberixae
  • 843
  • 5
  • 15