5

I am using GNU xargs (version 4.2.2) in parallel mode and I seem to be reliably losing output when redirecting to a file. When redirecting to a pipe, it appears to work correctly.

The following shell commands demonstrates a minimum, complete, and verifiable example of the issue. I generate 2550 numbers using xargs to split it into lines of 100 args each totalling 26 lines where the 26th line contains only 50 args.

# generate numbers 1 to 2550 where each number is on its own line
$ seq 1 2550 > /tmp/nums
$ wc -l /tmp/nums
2550 /tmp/nums

# piping to wc is accurate: 26 lines, 2550 args
$ xargs -P20 -n 100 </tmp/nums | wc
     26    2550   11643

# redirecting to a file is clearly inaccurate: 22 lines, 2150 args
$ xargs -P20 -n 100 </tmp/nums >/tmp/out; wc /tmp/out
     22  2150 10043 /tmp/out

I believe the problem is not related to the underlying shell since the shell will perform the redirection before the commands execute and wait for xargs to complete. In this case, I hypothesize xargs is completing before flushing the buffer. However if my hypothesis is correct, I do not know why this problem doesn't manifest when writing to a pipe.

Edit:

It appears when using >> (create/append to file) in the shell, the problem doesn't seem to manifest:

# appending to file
$ >/tmp/out
$ xargs -P20 -n 100 </tmp/nums >>/tmp/out; wc /tmp/out
     26    2550   11643

# creating and appending to file
$ rm /tmp/out
$ xargs -P20 -n 100 </tmp/nums >>/tmp/out; wc /tmp/out
     26    2550   11643
Community
  • 1
  • 1
snap
  • 711
  • 3
  • 11
  • 25
  • 1
    I am getting accurate ouput in both the cases. `Shell> wc -l /tmp/nums 2550 /tmp/nums Shell> xargs -P20 -n 100 xargs -P20 -n 100 /tmp/out; wc /tmp/out 26 2550 11643 /tmp/out Shell> ` – Sriharsha Kalluru Sep 08 '15 at 06:25
  • 2
    Do you get the correct result reliably if you empty the output file and then use `>>` instead of `>` redirection? If so, there's some sort of an explanation. – Jonathan Leffler Sep 08 '15 at 06:31
  • @JonathanLeffler: Looks like you're right. With `>>` the problem doesn't manifest. I tried to create the file ahead of time and redirect using and use '>' (truncating the existing file), and the problem seems to reappear. – snap Sep 08 '15 at 06:38
  • When you use the `>` redirection, what numbers appear at the start of `/tmp/out`? Are they numbers like 1, 2, 3, or are they numbers like 2001, 2002, 2003? I'm having some problems coming up with a plausible mechanism for the trouble. The pipe and append behaviour is easy enough to explain. But the behaviour with `>` should be essentially the same, and I'm left wondering how things get broken. Do you have `truss` or `strace` available? If so, it might be instructive to look at what the `xargs` process does (but not — at lest in the first place — what its children do). _[…continued…]_ – Jonathan Leffler Sep 08 '15 at 07:07
  • _[…continuation…]_ Is there any useful information in `xargs.log` after you run `strace -o xargs.log xargs -P 20 -n 100 /tmp/out`? I'm thinking of something like an `lseek()` on file descriptor 1, but I'm not sure how plausible that is. One problem may be that it is in fact a child that is causing the mischief; in that case, you'd need to use the 'follow children' option (`-f`) to see what's causing the trouble. But the output would be a lot more voluminous. I get the 'correct' output on both Mac OS X 10.10.5 and Ubuntu 14.04 LTS (running in a VM under Mac OS X). – Jonathan Leffler Sep 08 '15 at 07:12
  • Thanks for the suggestion using `strace`. I'm analyzing the output now. The problem is happening in Ubuntu 14.04 LTS (also in a VM), but I noticed it is more apparent on some systems over others. I find the problem comes up rapidly in a crude while loop: `seq 1 2550 > /tmp/nums; while true; do xargs -P20 -n 100 /tmp/out; wc /tmp/out; done | grep -v ' 26 '`. I tried this on OSX 10.10.4 and was unable to have this problem manifest as well. [1/2] – snap Sep 08 '15 at 07:39
  • However, I just found a [similar problem](http://stackoverflow.com/questions/31926950/explicit-sort-parallelization-via-xagrs-incomplete-results-from-xargs-max-p). It definitely might be due to `xargs` parent process detaching from children, exiting early, while a child (that inherits and writes to stdout) is not flushing the buffer. I'll examine the strace output to see if I can pinpoint where this is happening. [2/2] – snap Sep 08 '15 at 07:41
  • I would like to read more questions like yours! – wap26 Sep 08 '15 at 07:57

2 Answers2

1

Your problem is due to the output from different processes being mixed. It is shown here:

parallel perl -e '\$a=\"1{}\"x10000000\;print\ \$a,\"\\n\"' '>' {} ::: a b c d e f
ls -l a b c d e f
parallel -kP4 -n1 grep 1 > out.par ::: a b c d e f
echo a b c d e f | xargs -P4 -n1 grep 1 > out.xargs-unbuf
echo a b c d e f | xargs -P4 -n1 grep --line-buffered 1 > out.xargs-linebuf
echo a b c d e f | xargs -n1 grep 1 > out.xargs-serial
ls -l out*
md5sum out*

The solution is to buffer the output from each job - either in memory or in tmpfiles (like GNU Parallel does).

Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • 1
    I agree that there is absolutely no control over the mixing of output on stdout (unless the `write()`s by the child process are bounded in size, atomic, and the application allows output mixing), but this doesn't explain _losing_ input which in both my example and yours is happening. I actually switched to parallel because of the output grouping. – snap Sep 08 '15 at 22:34
  • That is due to multiple file descriptors open for the same file: If they write after each other there is no problem. If they write simultaneously they will be writing to the same positions in the file. This also explains why you do not see the issue if you redirect to a pipe instead of a file: There is no file position in a pipe. This also explains why >> does not cause the behaviour. – Ole Tange Sep 08 '15 at 22:52
0

I know this question is about xargs, but if you keep on having issues with it, then perhaps GNU Parallel may be of help. Your xargs invocation would translate to:

$ < /tmp/nums parallel -j20 -N100 echo > /tmp/out; wc /tmp/out
26  2550 11643 /tmp/out