4

Previously I was using uuidgen to create unique filenames that I then need to iterate over by date/time via a bash script. I've since found that simply looping over said files via 'ls -l' will not suffice because evidently I can only trust the OS to keep timestamp resolution in seconds (nonoseconds is all zero when viewing files via stat on this particular filesystem and kernel)

So I then though maybe I could just use something like date +%s%N for my filename. This will print the seconds since 1970 followed by the current nanoseconds.

I'm possibly over-engineering this at this point, but these are files generated on high-usage enterprise systems so I don't really want to simply trust the nanosecond timestamp on the (admittedly very small) chance two files are generated in the same nanosecond and we get a collision.

I believe the uuidgen script has logic baked in to handle this occurrence so it's still guaranteed to be unique in that case (correct me if I'm wrong there... I read that someplace I think but the googles are failing me right now).

So... I'm considering something like

FILENAME=`date +%s`-`uuidgen -t`
echo $FILENAME

to ensure I create a unique filename that can then be iterated over with a simple 'ls' and who's name can be trusted to both be unique and sequential by time.

Any better ideas or flaws with this direction?

slumtrimpet
  • 3,159
  • 2
  • 31
  • 44
  • Just use the nanoseconds. And please, really think twice about the difference between micro-mili and nanoseconds - Check 1st how the underlaying OS implements the nanosecond timestamps. (e.g. how the RTC is fetched by the OS - and how many instructions could the processor do in one nanosecond...) – clt60 Feb 23 '15 at 15:15
  • I guess it is a generally bad idea but you could replace `uuidgen` with `mktemp` to get shorter names. An obvious disadvantage to that is that you are then flooding your temp folder with zero byte files. – the swine Nov 26 '15 at 15:22

3 Answers3

7

If you order your date format by year, month (zero padded), day (zero padded), hour (zero padded), minute (zero padded), then you can sort by time easily:

FILENAME=`date '+%Y-%m-%d-%H-%M'`-`uuidgen -t`
echo $FILENAME

or

FILENAME=`date '+%Y-%m-%d-%H-%M'`-`uuidgen -t | head -c 5`
echo $FILENAME

Which would give you:

2015-02-23-08-37-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

or

2015-02-23-08-37-xxxxx
# the same as above, but shorter unique string

You can choose other delimiters for the date/time besides - as you wish, as long as they're within the valid characters for Linux file name.

Kostanos
  • 9,615
  • 4
  • 51
  • 65
lurker
  • 56,987
  • 9
  • 69
  • 103
  • Thanks, I think my question is more the uniqueness of the 'uuidgen -t' part and whether it will be sequential for files made in the same second (or whatever resolution/format I end up formatting the date call with). I'll try and clarify the question. – slumtrimpet Feb 23 '15 at 14:00
  • 1
    @slumptrimpet OK, your question says, specifically, *unique and sequentially by time*. `uuidgrn` is understood to be unique, but itself may not be sequential in time. At least it's not required to be. – lurker Feb 23 '15 at 14:55
  • @slumtrimpet oops, I meant `uuidgen` of course (got fat-fingered on Android). :) And, of course, seconds and nanoseconds could be included in the date format, `+%Y-%m-%d-%H-%M-%S-%N` or whatever... – lurker Feb 23 '15 at 15:39
  • Yeah... I agree with what I think you are saying. Sure we can timestamp with nanoseconds, and we can append a UUID to ensure uniqueness, but my requirement to have alpha-numerically sequential UUID's is out of the scope of what UUID's actually are. I might post a new question about UUID algorithms to see if I can find one that is sequential but for now I'll accept this one as I think it answers my original question. – slumtrimpet Feb 23 '15 at 19:01
  • @slumptrimpet yeah go for it. It's an interesting question. – lurker Feb 23 '15 at 19:10
2

You will need %N for precision (nanoseconds):

filename=$(date +%s.%N)_$(uuidgen -t); echo $filename
1424699882.086602550_fb575f02-bb63-11e4-ac75-8ca982a9f0aa

BTW if you use %N and you're not using multiple threads, it should be unique enough.

Tiago Lopo
  • 7,619
  • 1
  • 30
  • 51
  • Thanks for the answer. I updated the question a bit. This is a multi-threaded enterprise system so my fear is the %N isn't enough. – slumtrimpet Feb 23 '15 at 14:07
  • so nanoseconds + uuid will be enough for sure – Tiago Lopo Feb 23 '15 at 14:12
  • Explanation for last para: the CPU instructions required to query the time will themselves take multiple nanoseconds. Which is why this assumption wouldn't work multithreaded. It also assumes you're getting true nanosecond resolution on your time value. – thomasrutter Nov 07 '17 at 22:04
1

You could take what TIAGO said about %N precision, and combine it with taskset You can find some info here: http://manpages.ubuntu.com/manpages/hardy/man1/taskset.1.html and then run your script

taskset --cpu-list 1 my_script

Never tested this, but, it should run your script only on the first core of your CPU. I'm thinking that if your script runs on your first CPU core, combined with date %N (nanoseconds) + uuidgen there's no way you can get duplicate filenames.

candymanuu
  • 110
  • 7