9

I have my script using while read to process some file line by line..

When I do:

head -n5 file1 | ./myscript.sh

I get my results well.

But trying to parallelize it using gnu parallel:

head -n5 file1 | parallel -j 4 ./myscript.sh

yields result file empty!?

I tried also with:

parallel -j 4 -a file1 ./myscript.sh

but still doesn't work. I was trying to do similar to what they say in documentation, but without any success. What am I doing wrong?

EDIT:

Maybe this can help:

head -n5 file1 | parallel -a - -j 4 echo #this works
head -n5 file1 | parallel -a - -j 4 ./myscript #this doesn't
zx8754
  • 52,746
  • 12
  • 114
  • 209
branquito
  • 3,864
  • 5
  • 35
  • 60

2 Answers2

8

parallel doesn't send the lines of input to stdin of the command given to it, but appends the line to the command you give.

If you write it like you have, then you're effectively calling ./myscript.sh <INPUT>, where you want to call ./myscript.sh, and send the input as stdin.

This should work:

head -n5 file1 | parallel -j 4 "echo {} | ./myscript.sh"

The {} indicates to parallel where you want the input to go, rather than the default of at the end.

gandaliter
  • 9,863
  • 1
  • 16
  • 23
  • 1
    If the script is writing to a results file then it might be overwriting it each time. You need to make it append to the file instead. `parallel` will create a new instance of your script for each of the input lines. – gandaliter Sep 17 '15 at 19:17
  • you were right about append, but now my counter in file stays on 1 for each increment in line, because all of those isolated processes :), anyway could you explain the situation on my edit above? – branquito Sep 17 '15 at 19:20
  • 1
    I don't know what counter you mean; what does the script do? `echo ` outputs ``, so the first of your examples should print each line to `stdout`. Your script needs the input to be sent on `stdin` though, rather than as an argument. – gandaliter Sep 17 '15 at 19:25
  • ok now parallel is working, but I get repeated and erroneous results, as in my script I am doing matches on words from one file, onto another big file via `grep`, isn't it supposed to take care of splitting files into chunks per process so that they are not mixed? – branquito Sep 17 '15 at 19:34
  • 1
    Each instance of the script will be given only one line as input. I don't really understand what you're trying to do with this script. Could you post it? – gandaliter Sep 17 '15 at 19:38
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/89957/discussion-between-branquito-and-gandaliter). – branquito Sep 17 '15 at 19:39
5

--pipe is made for you:

cat file1 | parallel --pipe -N5 ./myscript.sh

But you need to change myscript.sh so it does not save to result but instead print the output to stdout. Then you can:

cat file1 | parallel --pipe -N5 ./myscript.sh > result

and avoid any mixing.

Ole Tange
  • 31,768
  • 5
  • 86
  • 104