Questions tagged [gnu-parallel]

GNU parallel is a shell tool for executing jobs in parallel using one or more computers.

GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel. more...

733 questions
10
votes
2 answers

How to use GNU parallel with find -exec?

I want to unzip multiple files, Using this answer, I found the following command. find -name '*.zip' -exec sh -c 'unzip -d "${1%.*}" "$1"' _ {} \; How do I use GNU Parallel with the above command to unzip multiple files? Edit 1: As per questions by…
Tarun Maganti
  • 3,076
  • 2
  • 35
  • 64
10
votes
3 answers

Basename in GNU Parallel

I have hundreds of files, named as follows: RG1-t.txt RG1-n.txt RG2-t.txt RG2-n.txt etc... I would like to use GNU parallel to run scripts on them, but I struggle to get the basenames of the files, so RG1, RG2 etc... so that I can run: ls RG*.txt |…
ATpoint
  • 603
  • 5
  • 17
10
votes
2 answers

How do I terminate GNU parallel without killing running jobs?

I'm running a bunch of shell scripts like parallel -a my_scripts bash and at some point I decided I've run enough of them and would like to stop spawning new jobs, and simply let all the existing jobs finish. Put another way, I want to kill the…
Yibo Yang
  • 2,353
  • 4
  • 27
  • 40
10
votes
1 answer

GNU parallel: does -k (keep output order) affect speed?

As said in the title, I'm wondering if the -k option (strongly) affects the speed of GNU parallel. In man parallel_tutorial there is a discussion about --ungroup and --line-buffer, which claims that --linebuffer, which unmixes output lines, is much…
4ae1e1
  • 7,228
  • 8
  • 44
  • 77
9
votes
1 answer

gnu parallel to parallelize a for loop

I have seen several questions about this topic, but I lack the ability to translate this to my specific problem. I have a for loop that loops through sub directories and then executes a .sh script on a compressed text file inside each directory. I…
Phil_T
  • 942
  • 9
  • 27
9
votes
3 answers

Why would gnu parallel chunking improve gzip's compression size?

File under: "Unexpected Efficiency Dept." The first 90 million numbers take up about 761MB, as output by: seq 90000000 According to man parallel, it can speed up gzip's archiving big files by chopping the input up, and using different CPUs to…
agc
  • 7,973
  • 2
  • 29
  • 50
9
votes
1 answer

gnu parallel: Prefix output with hostname(s)

Is it possible to prefix the output of gnu parallel when I run same command on multiple hosts? I have 10 worker machines in a worker-pool and any one of them could've picked up the job, I want to find out which worker picked it up by greping log…
Kashyap
  • 15,354
  • 13
  • 64
  • 103
9
votes
2 answers

Running bash script using gnu parallel

I have my script using while read to process some file line by line.. When I do: head -n5 file1 | ./myscript.sh I get my results well. But trying to parallelize it using gnu parallel: head -n5 file1 | parallel -j 4 ./myscript.sh yields result file…
branquito
  • 3,864
  • 5
  • 35
  • 60
9
votes
2 answers

how to parallelize "make" command which can distribute task on multiple machine

I been compiling a ".c / .c++" code which takes 1.5hour to compile on 4 core machine using "make" command.I also have 10 more machine which i can use for compiling. I know "-j" option in "make" which distribute compilation in specified number of…
9
votes
2 answers

GNU parallel --jobs option using multiple nodes on cluster with multiple cpus per node

I am using gnu parallel to launch code on a high performance (HPC) computing cluster that has 2 CPUs per node. The cluster uses TORQUE portable batch system (PBS). My question is to clarify how the --jobs option for GNU parallel works in this…
Steve Koch
  • 912
  • 8
  • 21
9
votes
2 answers

Parallelize nested for loop in GNU Parallel

I have a small bash script to OCR PDF files (slightly modified this script). The basic flow for each file is: For each page in pdf FILE: Convert page to TIFF image (imegamagick) OCR image (tesseract) Cat results to text…
Tomas Greif
  • 21,685
  • 23
  • 106
  • 155
9
votes
1 answer

Inheriting environment variables with GNU Parallel

I would like to inherit environment variables in GNU Parallel. I have several 'scripts' (really just lists of commands, designed for use with GNU Parallel) with hundreds of lines each that all call different external programs. However, these…
Worbis
  • 93
  • 1
  • 3
8
votes
3 answers

How to do large file parallel encryption using GnuPG and GNU parallel?

I'm trying to write a parallel compress / encrypt backup script for archiving using GNU parallel, xz and GnuPG. The core part's of script is: tar --create --format=posix --preserve-permissions --same-owner --directory $BASE/$name --to-stdout . \ …
Yongbin Yu
  • 108
  • 1
  • 8
8
votes
2 answers

Run a specifiable number of commands in parallel - contrasting xargs -P, GNU parallel, and "moreutils" parallel

I'm trying to run multiple mongodump's on 26 servers in a bash script. I can run 3 commands like mongodump -h staging .... & mongodump -h production .... & mongodump -h web ... & at the same time, and when one finishes I want to start another…
basante
  • 515
  • 3
  • 9
  • 20
8
votes
2 answers

Why are 5 jobs run with GNU Parallel --jobs 4 option in the tutorial?

I am working through the GNU Parallel totorial. In the "More than one argument" section, there is the following example (note: num30000 is a text file with numbers 1 to 30,000 on sequential lines): For better parallelism GNU Parallel can distribute…
Steve Koch
  • 912
  • 8
  • 21
1
2
3
48 49