Partial uuids a good idea?

Question

I need to generate and store a identifier per row in a distributed database (high write throughput). There are constraints on length of the Id, preferring it to be as small as possible. Id must be in a utf8.

I was considering generating a uuidv4, converting to base16 encoding, removing the hyphens and taking a partial subset of characters, and in the future if we need more characters we take a larger partial subset.

e.g. Uuid = 123e4567-e89b-12d3-a456-426655440000

Subset = 123e4567e89b

Are there foreseeable issues with this?

Dunno. Imagine we printed a phone book using "partial people uuids" - everyone in there's gonna be listed just by their first name. Can we foresee any issues already? — CBroe, Jul 08 '18 at 02:50
You are taking the timestamp fields of a v4 uuid. Timestamps are susceptible to systematic collision. — Raymond Chen, Jul 08 '18 at 03:05
You can take whatever you want as your key. But the shorter it is, the higher is the possibility of collisions. Especially if the keys are generated in a distributed system. — derpirscher, Jul 08 '18 at 03:33
Depending on the quality of the random generator, the uuid may not be evenly distributed, which may increase the chances of collisions too. — derpirscher, Jul 08 '18 at 03:48
@CBroe you're assuming the first part of a uuid is from a smaller subset of possibilities. As i understand it 122 bits of the uuidv4 are pseudo random and 6 bits are invariant, (https://en.m.wikipedia.org/wiki/Universally_unique_identifier?wprov=sfla1) using fewer than 128 bits results in higher probability of collision. I'm just trying to understand consequences of using uuidv4 ve rolling my own — rickyrattlesnake, Jul 08 '18 at 06:16

score 1 · Accepted Answer · answered Jul 08 '18 at 02:58

1

You cannot guarantee that partial UUID’s will be universally unique. Now, depending on the number of UUIDs generated, this might not be an issue - especially if you check for duplicates...but perhaps its better just to write your own ID generator with the length specification that you need. I suppose the actual specification for UUIDs requires a certain number of bits for each to be deemed universally unique, but your requirements limit length. They do not require the use of actual UUIDs.

answered Jul 08 '18 at 02:58

brianolive

1,573
2
9
19

Yeh this makes sense. I do have the ability to check uniqueness using the insert operation on the dB but the fewer collisions the faster the insert. I only need uniqueness in the order of 10^6. random number gens need a seed, so I'm guessing a timestamp to the millisecond would be good enough, I'm not expecting that many writes per sec – rickyrattlesnake Jul 08 '18 at 06:08

score 0 · Answer 2 · answered Nov 03 '18 at 23:29

If your field must be text and length matters, then using base16 only gives you 4 bits per byte whereas base64 gives 6 bits per byte. In other words, the former needs 50% more bytes to achieve the same collision probability as the latter. You could get to ~7 bits per byte by taking advantage of how UTF-8 works, but that's a lot more work (and risk) for a lot less gain.

There is no point in using a truncated UUID, though; you have to use the whole thing or its anti-collision properties don't hold. If you just want a random string, especially when you have the ability to check for collisions, just generate a random number with the desired number of bits (preferably a multiple of 6) and then base64 encode it.

Partial uuids a good idea?

2 Answers2