40

I'm pretty much confused on this. Need some clarifications.

Example 1 :

pgrep string | xargs ps

Example 2 :

find . | xargs grep whatever

From Example 1, I gather it's this way:

Search for a "string" which is part of name of running process and return the process-ids of all the matches to 'xargs ps' -> which just appends ps to the matches (which are process-ids themselves) to get the same output as :

ps <processid>

Can someone explain what xargs really does in this case?

From Example 2, I gather it's this way:

It's to search for some "string" recursively from the current working directory. Here, how exactly does 'xargs' work?

I was of the opinion that 'xargs' repeatedly appends data from standard input to the 'argument' given to xargs (which usually is a UNIX command by itself).

From xargs() man page :

xargs reads items from the standard input, delimited by blanks (which can be protected with double or single quotes or a backslash) or newlines, and executes the command (default is /bin/echo) one or more times with any initial-arguments followed by items read from standard input. Blank lines on the standard input are ignored.

halluc1nati0n
  • 893
  • 2
  • 13
  • 18
  • 1
    xargs acts like the "command subtitution" (at least with Bash). It turns multiline results (vertical) into a one-line list of tokens arguments (horizontal). (Note that you might filter the results a little bit (using sed for ex) before passing it through xargs). Plus xargs handles the « _too much arguments_ » error that may accure before linux kernel 2.6.23 (See [wikipedia](https://en.wikipedia.org/wiki/Xargs) ). Here is [another helpful thread](http://unix.stackexchange.com/questions/24954/when-is-xargs-needed) – Stphane Oct 29 '15 at 13:40

5 Answers5

60

In general xargs is used like this

prog | xargs utility

where prog is expected to output one or more newline/space separated results. The trick is that xargs does not necessarily call utility once for each result, instead it splits the results into sublists and calls utility for every sublist. If you want to force xargs to call utility for every single result you will need to invoke it with xargs -L1.

Note that xargs promises you that the sublist sent to utility is shorter than ARG_MAX (If you're curious, you can get the current value of ARG_MAX using getconf ARG_MAX.) This is how it avoids those dreaded "Argument list to long" errors.

Choylton B. Higginbottom
  • 2,236
  • 3
  • 24
  • 34
Lars Tackmann
  • 20,275
  • 13
  • 66
  • 83
  • 1
    Well, this is something I could relate to, but it's become hell a lot more confusing now. I know the basic reason why xargs is there, but it gets convoluted when I see it being used for more than one purpose (in different ways).. – halluc1nati0n Dec 15 '09 at 08:42
  • 7
    Consider this command "find /etc -type d -depth 1 | xargs echo" which prints all the directories in the /etc folder (but not their sub directories). As echo takes multiple arguments the result is one long line "/etc/dir1 /etc/dir2…". If you instead call "find /etc -type d -depth 1 | xargs -L1 echo" then echo gets invoked once for each result, resulting in each directory from /etc getting printed on a line by itself. – Lars Tackmann Dec 15 '09 at 12:11
  • How do I pass commands or options to the utility? Say, I want to run uglifyjs and specify and output folder for the input coming into xargs? http://stackoverflow.com/questions/43149786/how-to-process-files-in-nested-directories – Costa Michailidis Apr 01 '17 at 19:30
17

A good example of what xargs does is to try getting sorted checksums for every file in a directory using find.

find . | cksum  | sort

returns just one checksum, and it's not clear what it's the checksum for. Not what we want. The pipe sends the stdout from find into stdin for cksum. What cksum really wants is a list of command line args, e.g.

cksum file001.blah file002.blah  file003.blah

will report three lines, one per file, with the desired checksums. Xargs does the magic trick - converting stdout of the previous program to a temporary and hidden command line to feed to the next. The command line that works is:

find . | xargs cksum | sort

Note no pipe between xargs and cksum.

DarenW
  • 16,549
  • 7
  • 63
  • 102
  • 2
    btw, this is the main ingredient in my recipe for finding duplicated files in two or more directories, even if their names are different. – DarenW Dec 13 '09 at 22:48
  • thanks for insight and adding verbose to xargs gives command in action. find . | xargs --verbose cksum | sort – kumar Nov 25 '16 at 06:59
  • `find . | grep / | xargs cksum | sort` may be used to avoid the unwanted output `cksum: .: Is a directory` – Jarvis Apr 25 '19 at 08:44
  • 1
    @Jarvis Better yet change the find to: *`find . \! -type d`*. – Pryftan Sep 11 '19 at 15:59
8
$ echo 'line1
> line2
> line3
> ...
> lineN ' | xargs cmd1 -a -b

will result in:

$ cmd1 -a -b line1 line2 line3 ... lineN

xargs will break cmd1 ... into several executions of cmd1 if the line count gets too large.

xargs may be used for many other tasks related to passing stdin lines as positional arguments. Take a look at the capital -P option in xargs(1) for running several instances of a command in parallel.

Andrey Vlasovskikh
  • 16,489
  • 7
  • 44
  • 62
4
#!/bin/sh
#script to echo out the arguments 1 at a time!
for a in $*
do
    echo $a
done

the command

$sh myscript 1 2 3 4 5

will yield

1
2
3
4
5

but

$sh myscript 1 2 3 4 5 6 7 8 9 10 11

will not work since the max number of parameters is exceeded (im not actually sure what the max is, but lets say its 10 for this example!)

To get around this we could use

#!/bin/sh
#script to echo out the arguments 1 at a time!
for a in $*
do
    echo $a | xargs echo
done

we could then run it like this

 $sh myscript "1 2 3 4 5" "6 7 8 9 10 11"

and get the correct result since there are just 2 parameters

Paul Creasey
  • 28,321
  • 10
  • 54
  • 90
  • 2
    I don't know what the max, but its definitely not 10. And you can use $@ instead of $*. this is not a very good example of how to use xargs – ghostdog74 Dec 14 '09 at 00:40
  • @ghostdog74 As for that try: *`getconf ARG_MAX`*. This is something C programmers know about of course but anyway: '# bytes of args + environ for exec()'. On Linux it's #defined in *`/usr/include/linux/limits.h`*. And yes to your other points. – Pryftan Sep 11 '19 at 16:16
  • As for the answerer, Paul: I feel that your answer could be slightly improved by adding a space between the '$' and the 'sh'. Or even removing the '$' since it's not part of the command. At first glance (in my case poor vision + being exhausted on top of it) it looks like it's shell variable. Also you should quote the bash variables in the script. Also as @ghostdog74 points out you should change it to "$@". Cheers. – Pryftan Sep 11 '19 at 16:20
2

xargs is normally used to group arguments together so that you dont get a "too many arguments " error which occurs when you pass a large number of arguments to a command

ennuikiller
  • 46,381
  • 14
  • 112
  • 137