Is there any way to parellelly grep through bz2 files

Question

I recently found out this solution to less through compressed gz files parellelly based on the cores available.

find . -name "*.gz" | xargs -n 1 -P 3 zgrep -H '{pattern to search}'

P.S. 3 is the number of cores

I was wondering if there was a way to do it for bz2 files as well. Currently I am using this command:

find -type f -name '*.bz2' -execdir bzgrep "{text to find}" {} /dev/null \;

Eh? It's not the less that's parallelized, it's the grepping. Actually, you don't have `less` in your question anywhere at all... and you're parallelizing the easy way, multiple files at the same time but only one thread of execution per file, as opposed to the only-sometimes-possible way, decompressing the same file from multiple points in parallel (which requires the compressor to be configured to periodically reset itself and build a new table -- enabling parallel decoding at some cost to performance and output size). — Charles Duffy, Nov 10 '15 at 20:34
Also, your current version of the gzip one won't work for all possible filenames, since it's taking the output from `find` in line-oriented form, but filenames are allowed to contain literal newlines. To be fully safe, you need to use NUL delimiters (which can't exist in filenames or other content represented by C strings). — Charles Duffy, Nov 10 '15 at 20:36

Charles Duffy · Accepted Answer · 2019-10-17T12:19:22.257

5

Change *.gz to *.bz2; change zgrep to bzgrep, and there you are.

For a bit of extra safety around unusual filenames, use -print0 on the find end and -0 on the xargs:

find . -name "*.bz2" -print0 | xargs -0 -n 1 -P 3 bzgrep -H '{pattern to search}'

edited Oct 17 '19 at 12:19

answered Nov 10 '15 at 20:35

Charles Duffy

1 Answers1