22

I have a command which is attempting to generate UUIDs for files:

find -printf "%P\n"|sort|xargs -L 1 echo $(uuid)

But in the result, xargs is only executing the $(uuid) subshell once:

8aa9e7cc-d3b2-11e4-83a6-1ff1acc22a7e file1
8aa9e7cc-d3b2-11e4-83a6-1ff1acc22a7e file2
8aa9e7cc-d3b2-11e4-83a6-1ff1acc22a7e file3

Is there a one-liner (i.e not a function) to get xargs to execute a subshell command on each input?

adelphus
  • 10,116
  • 5
  • 36
  • 46
  • 1
    @TomFenech: `-n 1` would actually split by any whitespace, whether line-interior or not, so the command would break with paths with embedded whitespace; `-L 1` comes closer to the intent, in that it performs line-by-line processing, but word-splitting is still applied to each line, so that potentially _multiple_ arguments are passed to `echo` per input line (which may or may not cause problems). The robust approach is to use `-I`, as in the accepted answer. – mklement0 Mar 26 '15 at 13:24

3 Answers3

24

This is because the $(uuid) gets expanded in the current shell. You could explicitly call a shell:

find -printf "%P\n"| sort | xargs -I '{}' bash -c 'echo $(uuid) {}'

Btw, I would use the following command:

find -exec bash -c 'echo "$(uuid) ${1#./}"' -- '{}' \;

without xargs.

hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • 2
    Nicely done; but not only is `-n 1` is superfluous, because `-I` implies line-by-line processing, `-n 1` would actually split by _any_ whitespace, whether line-interior or not. While `-L 1` does perform line-by-line processing, word-splitting is still applied to each line, whereas `-I` treats the entire line as a _single_ argument. – mklement0 Mar 26 '15 at 13:13
6

hek2mgl's answer explains the problem well and his solution works well; this answer looks at performance.

The accepted answer is a tad slow, because it creates a bash process for every input line.

While xargs is generally preferable to and faster than a shell-code loop, in this particular case the roles are reversed, because shell functionality is needed in each iteration.

The following alternative solution uses a while loop to process the input lines, and, on my machine, is about twice as fast as the xargs solution.

find . -printf "%P\n" | sort | while IFS= read -r f; do echo "$(uuid) $f"; done

If you're concerned about filenames with embedded newlines (very rare) and use GNU utilities, you could use NUL bytes as separators:

find . -printf "%P\0" | sort -z | while IFS= read -d '' -r f; do echo "$(uuid) $f"; done

Update: The fastest approach is to not use a shell loop at all, as evidenced by ᴳᵁᴵᴰᴼ's clever answer. See below for a portable version of his answer.


Compatibility note:

The OP's find command implies the use of GNU find (Linux), and uses features (-printf) that may not work on other platforms.

Here's a portable version of ᴳᵁᴵᴰᴼ's answer that uses only POSIX-compliant features of find (and awk).
Note, however, that uuid is not a POSIX utility; since Linux and BSD-like systems (including OSX) have a uuidgen utility, the command uses that instead:

 find . -exec printf '%s\t' {} \; -exec uuidgen \; | 
   awk -F '\t' '{ sub(/.+\//,"", $1); print $2, $1 }' | sort -k2
Community
  • 1
  • 1
mklement0
  • 382,024
  • 64
  • 607
  • 775
4

With a for loop:

for i in $(find -printf "%P\n" | sort) ; do echo "$(uuid) $i";  done

Edit: another way to do this:

find -printf "%P\0" -exec uuid -v 4 \; | sort | awk -F'\0' '{ print $2 " " $1}'

this outputs the filename followed by the uuid (no subshell required) for letting the sort to happen, then swaps the two columns separated by null.

guido
  • 18,864
  • 6
  • 70
  • 95
  • This also works and is a slightly easier version to read as well as not having the overhead of a new bash on every argument. If I could split the credit, I would. Thanks. – adelphus Mar 26 '15 at 12:51
  • Using a shell loop in this instance is a good idea for performance reasons, but it's better to use a `while` loop, because `for` will break with filenames with embedded spaces, for instance - see http://mywiki.wooledge.org/DontReadLinesWithFor – mklement0 Mar 26 '15 at 14:02
  • 1
    @mklement0 that's very true thanks; anyway I decided this one that's discarding the loop is better – guido Mar 26 '15 at 14:58
  • 1
    Nicely done - that's even faster. As an aside: what platform are you (and the OP) on that you have a `uuid` utility? On BSD-like systems and Linux it's `uuidgen`. Interestingly, BSD `awk` interprets `-F'\0'` as `-F ''` (i.e., the _empty string_) and therefore splits the lines into individual characters (however, the `find` command as written wouldn't work with BSD `find` anyway). – mklement0 Mar 26 '15 at 17:51
  • 1
    @mklement0 it is this program http://www.ossp.org/pkg/lib/uuid/ packaged for fedora in my case; and GNU findutils 4.5.12 – guido Mar 26 '15 at 22:56
  • 1
    @mklement0 ...and on mine, uuid is in the same findutils package in Ubuntu. Interestingly the uuid utility creates time-based id's whereas uuidgen creates random-based id's (by default). This results in a strikingly different output when run in a loop - uuid creates sets of very similar ids, uuidgen creates more randomised values. – adelphus Mar 31 '15 at 14:49
  • @adelphus: Thanks for that; let me add: _GNU_ `uuidgen` allows explicit control of what type to generate: `-r` for random-based, `-t` for time-based. _BSD_ `uuidgen`, by contrast, doesn't support this, and seemingly _invariably_ creates _random_-based ones. – mklement0 Mar 31 '15 at 15:13