1

What is the best way to extract lines from a very large gz file that match multiple strings in a second file?

I've tried, which works for that string and surrounding:

gunzip -c /myfolder/large_file.gz | grep -B 50 "33754548"  > /myfolder/specific_linesfrom_large_files.txt

However, sometimes the strings needed are not in 50 lines near, so I attempted:

gunzip -c /myfolder/large_file.gz | grep  -F  /myfolder/multiple_strings.txt  > /myfolder/specific_linesfrom_large_files.txt

Which didn't work, any suggestions?

for example, the multiple_strings.txt file might contain:

16804029
42061608
42069963
42072123
177479064
177420374
Ramy M. Mousa
  • 5,727
  • 3
  • 34
  • 45
user3403622
  • 15
  • 1
  • 6

2 Answers2

1

use zgrep to search into compressed files. There are also other commands like bzgrep (for bzip2 files), xzgrep etc for compressed files.

zgrep -f match_strings.txt file.gz

-f is the flag for reading the patterns from a specified file.

thanasisp
  • 5,855
  • 3
  • 14
  • 31
0
gunzip -c /myfolder/large_file.gz | grep -f /myfolder/multiple_strings.txt > /myfolder/specific_linesfrom_large_files.txt

Using also -x you search for exact pattern, for example if you search 123 you can match 1234, 123 etc. using -x you match only 123.

Darby_Crash
  • 446
  • 3
  • 6