GNU Parallel | pipe command

Question

I am completely new in using GNU parallel and I need your advice in running the command below using GNU parallel:

/home/admin/Gfinal/decoder/decdr.pl --gh --w14b /data/tmp/KRX12/a.bin | 
perl  /home/admin/decout/decoder/flow.pl >> /data/tmp/decodedgfile/out_1.txt

I will run this command on a list of files (.bin), so what is the best (fastest) approach to achieve that using GNU parallel noting that the output of the first part of the command (/home/admin/Gfinal/decoder/decdr.pl --gh --w14b) is very large (> 2 GB).

Any help would be appreciated.

score 4 · Accepted Answer · answered Sep 06 '16 at 18:23

4

Will this work:

parallel /home/admin/Gfinal/decoder/decdr.pl --gh --w14b {} '|' perl  /home/admin/decout/decoder/flow.pl >> /data/tmp/decodedgfile/out_1.txt ::: /data/tmp/KRX12/*.bin

(If the output from flow.pl is more than your disk I/O can handle, try parallel --compress).

Or maybe:

parallel /home/admin/Gfinal/decoder/decdr.pl --gh --w14b {} '|' perl  /home/admin/decout/decoder/flow.pl '>>' /data/tmp/decodedgfile/out_{#}.txt ::: /data/tmp/KRX12/*.bin

It depends on whether you want a single output file or one per input file.

Also spend an hour walking through the tutorial. Your command line will love you for it. man parallel_tutorial

answered Sep 06 '16 at 18:23

Ole Tange

31,768
5
86
104

thanks a lot for your answer .. really appreciating your work .. one more question ... is using --pipe after ( /home/admin/Gfinal/decoder/decdr.pl --gh --w14b ) would make the process faster ? – Helmy Sep 06 '16 at 20:00
Not understood. But try it and measure. – Ole Tange Sep 06 '16 at 20:12

score 0 · Answer 2 · 2016-09-06T17:04:19.640

Here are some great videos for gnu-parallel / parallel

Ref youtube Part 1: GNU Parallel script processing and execution

Here is a link from the GNU web site for platform specific information.

Ref gnu parallel download information

"Multiple input sources

GNU parallel can take multiple input sources given on the command line. GNU parallel then generates all combinations of the input sources:

parallel echo ::: A B C ::: D E F

Output (the order may be different):

A D

A E

A F

B D

B E ............

The input sources can be files:

parallel -a abc-file -a def-file echo"

Ref GNU-Parallel-Tutorial

With reference to the pipe

Pipe capacity A pipe has a limited capacity. If the pipe is full, then a write(2) will block or fail, depending on whether the O_NONBLOCK flag is set (see below). Different implementations have different limits for the pipe capacity. Applications should not rely on a particular capacity: an application should be designed so that a reading process consumes data as soon as it is available, so that a writing process does not remain blocked.
  In Linux versions before 2.6.11, the capacity of a pipe was the same
   as the system page size (e.g., 4096 bytes on i386).  Since Linux
   2.6.11, the pipe capacity is 65536 bytes.  Since Linux 2.6.35, the
   default pipe capacity is 65536 bytes, but the capacity can be queried
   and set using the fcntl(2) F_GETPIPE_SZ and F_SETPIPE_SZ operations.
   See fcntl(2) for more information.
PIPE_BUF POSIX.1 says that write(2)s of less than PIPE_BUF bytes must be atomic: the output data is written to the pipe as a contiguous sequence. Writes of more than PIPE_BUF bytes may be nonatomic: the kernel may interleave the data with data written by other processes. POSIX.1 requires PIPE_BUF to be at least 512 bytes. (On Linux, PIPE_BUF is 4096 bytes.) The precise semantics depend on whether the file descriptor is nonblocking (O_NONBLOCK), whether there are multiple writers to the pipe, and on n, the number of bytes to be written:

Ref man7.org pipe

You could have a look at fcntl F_GETPIPE_SZ and F_SETPIPE_SZ operations for more information.

Ref fcntl

All the best

GNU Parallel | pipe command

2 Answers2