7

Here's a basic question. I'm curious as to how do xargs and gnu parallel differ when parallelizing code?

And are there use cases in which you'd use one over the other?

I ask this because I have seen answers to parallelization questions where using either tool has been deemed acceptable by the community.

Kleber Noel
  • 303
  • 3
  • 9
  • 2
    The answer to this might be quite contentious, and everyone else's mileage may vary (wildly), but IMHO **GNU Parallel** is significantly more powerful, flexible, configurable and capable, but less likely to be present on any given system. POSIX advocates may have differing views. Decide for yourself after reading this https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjMvIDwo5XqAhVKLBoKHbwCALEQFjAAegQIBRAB&url=https%3A%2F%2Fzenodo.org%2Frecord%2F1146014%2Ffiles%2FGNU_Parallel_2018.pdf%3Fdownload%3D1&usg=AOvVaw35x0WxcO2IE5NhKvyL9JXP – Mark Setchell Jun 22 '20 at 11:05

1 Answers1

7

Some of the differences are covered on: https://www.gnu.org/software/parallel/parallel_alternatives.html#differences-between-xargs-and-gnu-parallel

Tl;dr: xargs is faster because there is almost no overhead (~0.3 ms/job compared to GNU Parallel's ~3 ms/job). GNU Parallel is safer because it takes all sorts of precautions so you do not need to worry (e.g. output from two jobs running in parallel will not mix). GNU Parallel has loads of features that xargs does not have. GNU Parallel requires Perl, xargs does not. xargs is everywhere, GNU Parallel requires you to use --embed to make sure it is everywhere.

So in general: If the primary concern is to avoid overhead (e.g. if your jobs take a few ms to run each) or avoid installing Perl (e.g. if your system is embedded and thus resource strained), then use xargs (and take the relevant precautions depending your input/output).

Full disclosure: I have a vested interest in GNU Parallel.

Ole Tange
  • 31,768
  • 5
  • 86
  • 104