-1

I recently found out this solution to less through compressed gz files parellelly based on the cores available.

find . -name "*.gz" | xargs -n 1 -P 3 zgrep -H '{pattern to search}'

P.S. 3 is the number of cores

I was wondering if there was a way to do it for bz2 files as well. Currently I am using this command:

find -type f -name '*.bz2' -execdir bzgrep "{text to find}" {} /dev/null \;
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
john.p.doe
  • 411
  • 2
  • 10
  • 21
  • Just substitute `bzgrep` for `zgrep` in the `xargs`? – blm Nov 10 '15 at 20:26
  • Eh? It's not the less that's parallelized, it's the grepping. Actually, you don't have `less` in your question anywhere at all... and you're parallelizing the easy way, multiple files at the same time but only one thread of execution per file, as opposed to the only-sometimes-possible way, decompressing the same file from multiple points in parallel (which requires the compressor to be configured to periodically reset itself and build a new table -- enabling parallel decoding at some cost to performance and output size). – Charles Duffy Nov 10 '15 at 20:34
  • Also, your current version of the gzip one won't work for all possible filenames, since it's taking the output from `find` in line-oriented form, but filenames are allowed to contain literal newlines. To be fully safe, you need to use NUL delimiters (which can't exist in filenames or other content represented by C strings). – Charles Duffy Nov 10 '15 at 20:36

1 Answers1

5

Change *.gz to *.bz2; change zgrep to bzgrep, and there you are.

For a bit of extra safety around unusual filenames, use -print0 on the find end and -0 on the xargs:

find . -name "*.bz2" -print0 | xargs -0 -n 1 -P 3 bzgrep -H '{pattern to search}'
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441