5

Given a task with several commands combined by pipe:

cat input/file1.json | jq '.responses[0] | {labelAnnotations: .labelAnnotations}' > output/file1.json

Now, there are thousands of input JSON files, and I like to leverage GNU Parallel to parallelize all process. How could I do that? Something like this?

parallel cat {} | jq '...' > output/{./} ::: input/*.json

note: It gets even more complicated if there is a pipe inside jq's filter...

Drake Guan
  • 14,514
  • 15
  • 67
  • 94

1 Answers1

5

https://www.gnu.org/software/parallel/man.html#QUOTING says:

Conclusion: To avoid dealing with the quoting problems it may be easier just to write a small script or a function (remember to export -f the function) and have GNU parallel call that.

In your case it will look like this:

doit() {
  cat "$1" |
    jq '.responses[0] | {labelAnnotations: .labelAnnotations}' > "$2" 
}
export -f doit

parallel doit {} output/{/} ::: input/*.json

A nice thing about this is that you can test it:

doit input/foo1.json output/foo1.json

And when that works, parallelizing it is trivial.

If you have newer version of GNU Parallel this should work, too:

parallel --results output/{/} -q jq '.responses[0] | {labelAnnotations: .labelAnnotations}' ::: input/*.json
Drake Guan
  • 14,514
  • 15
  • 67
  • 94
Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • Thank you! I did solve this quoting challenges by writing a small script. Your sharing about `-q` and `--results` are awesome, too! – Drake Guan Apr 13 '17 at 18:57
  • hmm I am having similar problem with no solution: https://stackoverflow.com/questions/75268073/using-jq-and-gnu-parallel-together – AJW Jan 28 '23 at 15:46