1

Following up on my question: Unique Linux filename, sortable by time

I need to generate a UUID that is itself alpha-numerically sequential over time. I assume I'll need to append the system date seconds since epoch and nanoseconds. This means I really just need an UUID algorithm that is alpha-numerically sequential within a given nanosecond.

So for example, I'm thinking of uuid's something like:

SECONDS_SINCE_EPOCH.NANOSECONDS.UID

The following bash:

for i in `seq 1 10`;
do
  echo `date '+%s.%N'`.`uuidgen -t`
done

Results in:

1424718695.481439000.c8fef5d4-bb8f-11e4-92c7-00215e673861
1424718695.484130000.c8ff5eb6-bb8f-11e4-ae12-00215e673861
1424718695.486718000.c8ffc2ca-bb8f-11e4-ae15-00215e673861
1424718695.489267000.c90025bc-bb8f-11e4-a624-00215e673861
1424718695.491803000.c90089f8-bb8f-11e4-95ac-00215e673861
1424718695.494381000.c900ed76-bb8f-11e4-9058-00215e673861
1424718695.496899000.c901513a-bb8f-11e4-8018-00215e673861
1424718695.499460000.c901b440-bb8f-11e4-b382-00215e673861
1424718695.502007000.c90217a0-bb8f-11e4-89cd-00215e673861
1424718695.504532000.c90279d4-bb8f-11e4-b515-00215e673861

These files names appear as though they would suffice... but my fear is that I can't promise the names will be alpha-numerically sequential IF two files are created within the same nano-second (think large scale enterprise system with 10's of cores running many concurrent users). Because at that point I'm relying solely on the UUID algorithm for my unique name and all the UUID algorithm promises is uniqueness, not "alpha-numeric-sequential-ness".

Any ideas for a method that can guarantee uniqueness AND alpha-numeric sequential order? Because we're dealing with large enterprise systems, I need to keep my requirements as old-school as possible but I can probably swing some older versions of Python and whatnot if a solution in pure bash isn't readily available.

Community
  • 1
  • 1
slumtrimpet
  • 3,159
  • 2
  • 31
  • 44
  • Is there any concurrency involved here? How much does it actually matter if two files in the same nanosecond don't have a sort order that corresponds to their relative creation time? – Martijn Pieters Feb 23 '15 at 19:25
  • I like to think it actually doesn't matter.... but yes there is concurrency and it's important that we always order the events by the specific order in which they are processed on the system (we are dealing with enterprise data replication and auditing). In practice, how likely are we to really have a collision here? I'd say not likely at all... probably. But if there's any solution available I'd love to implement it rather than leaving the possibility open. – slumtrimpet Feb 23 '15 at 19:33
  • I thought I had a solution for you, only to come to the realisation that the time portion of a RFC 4122 UUID is used with the *low* half first in the UUID, so the first 4 octets rapidly repeat, making the whole UUID concept not lexicographically ordered in time. – Martijn Pieters Feb 23 '15 at 19:53
  • Can you provide a strict ordering for when your events are considered "processed"? Is it when they are first received, or at the very end of processing each request, or somewhere in the middle? What do you want to do if an event being processed both arrived earlier and is finished later than another event? Can you even generate a UUID-like identifier in a nanosecond or less? – twalberg Feb 23 '15 at 21:06
  • @twalberg The events are being fired via DB triggers actually so our event happens in the single instant the DB transaction is committed. As far as collisions go, yes it's definitely a multi-threaded environment so it's possible two UUID's could be generated on two separate threads in the same nanosecond. – slumtrimpet Feb 23 '15 at 21:19
  • 1
    @slumtrimpet Really? Are you aware that even on a 4GHz processor, assuming that you can reliably complete 2 instructions per clock tick (which is not extremely likely sustainable), a nanosecond gives you just enough time for about 8 instructions? I don't think you can generate a full UUID in a nanosecond... Millisecond, probably. Microsecond, maybe, but that's getting iffy. But not nanosecond. – twalberg Feb 23 '15 at 21:44
  • @twalberg We are dealing with 10's of ~4GHZ cores in an enterprise server. – slumtrimpet Feb 24 '15 at 12:07

1 Answers1

1

Based on another answer, you could reorder the time portions of the UUID so that the most-significant value shows up first, on down to the least significant. This is the more "natural" way that, say, UNIX time is presented and produces the sort order that you are looking for.

So the follow BASH should do the trick in your case:

for i in `seq 1 10`; do
    echo $(date '+%s.%N').$(uuidgen -t | cut -d- -f3,2,1,4,5)
done

Bare in mind that there are no guarantees. Given enough tries and enough time, a collision will occur. If at all possible, you may want to do some sanity book checking further down the process chain which can correct any such mistakes before the data gets entered into a permanent record.

Community
  • 1
  • 1
Dave
  • 3,428
  • 30
  • 28
  • Apparently `cut` does not print the fields listed in the order they are listed. I had to use `awk` instead to get it right: `uuidgen -t | awk -F- '{OFS="-"; print $3,$2,$1,$4,$5}'`. – Dave Feb 22 '17 at 07:27