grep multiple patterns single file argument list too long

Question

I am currently searching for multiple patterns in a file. The file is of 90GB in size, I am searching on a particular field(from position 6-17 in each line). I am trying to get all the lines that contain any of a particular list of numbers. The current syntax I am using is:

grep '^.\{6\}0000000012345\|^.\{6\}0000000012543' somelargeFile.txt > outputFile.txt

For small number of patterns this works. For a large number of patterns I get the "Argument list too long" error.

One alternative I have tried is to search for each patters separately (using a for loop over the patterns), but this will require multiple passes over the large data file(57102722 lines) which is not efficient.

From what I understand about the "Argument list too long" error, it is related to bash cmds in general and not specific to grep. Is there any setting that can be used to get around this error? Or alternatively, any ideas as to how to do this using awk or sed or another tool?

Thank you!

Weird, the "Argument list too long" normally refers to the files you are mentioning. Side note: what about saying `grep -E '^.{6}00000000(12345|12543)'`? — fedorqui, Jul 17 '15 at 15:56
You can look at this question: https://stackoverflow.com/questions/31479822/grep-multiple-patterns-single-file-argument-list-too-long — Juan Miguel Díaz Pérez, Aug 07 '20 at 11:22

score 6 · Accepted Answer · answered Jul 17 '15 at 16:02

6

You can avoid the problem by putting the patterns in a file, and using the -f command line option to grep.

The most convenient is to put each alternative in a separate line of the file:

patterns.txt

^.\{6\}0000000012345
^.\{6\}0000000012543

invocation

grep -f patterns.txt somelargeFile.txt > outputFile.txt

answered Jul 17 '15 at 16:02

rici

234,347
28
237
341

Using the -f option does resolve the issue here. I don't get the "Arguments list too long" error. But, I do have a question regarding the usage of -f option. Does it loop over the data once for any number of entries in the patterns.txt or does it loop over the data once for each pattern in the patterns.txt. – reddy Jul 17 '15 at 18:02
It compiles a single regex and then reads the input.file once. – rici Jul 17 '15 at 18:34

Avinash Raj · Answer 2 · 2015-07-17T16:36:17.620

1

Try using alternation operator.

grep '^.\{6\}0000000012\(345\|543\)'

edited Jul 17 '15 at 16:36

answered Jul 17 '15 at 15:55

Avinash Raj

172,303
28
230
274

grep multiple patterns single file argument list too long

2 Answers2

patterns.txt

invocation

Linked