-1

I have file1.txt with 100 entries. Need to search the contents of file1.txt in file2.bz2 file which is a large bzip file. bzgrep -f file1.txt file2.bz2 takes long time.

Joshua
  • 40,822
  • 8
  • 72
  • 132
ush rani
  • 15
  • 2
  • 4

2 Answers2

0

You can do nothing. File is compressed and the only way to search is to decompress it.
One possible workaround is to keep uncompressed version of the file.

Romeo Ninov
  • 6,538
  • 1
  • 22
  • 31
0

You can do a lot, but it's a truly excessive amount of work.

bzip2 files are composed of chunks. You can cut the file up by chunks, full-text index each one, and save the indexes. If you have some idea of the keywords you can filter your indexes, otherwise you get the full index mayhem from all the text. This tends to be something like 10-100 times the size of the original uncompressed document.

If there's only certain places the words to be indexed occur or you can limit the number of words to be indexed AND searches are much more frequent than documents you can make this work.

Idea blatantly stolen from here: https://www.thanassis.space/buildWikipediaOffline.html

Joshua
  • 40,822
  • 8
  • 72
  • 132
  • This is interesting idea. But will payoff only if you have hundreds of documents.... For just few files the effort will be more expensive than just keep uncompressed files – Romeo Ninov Feb 07 '19 at 09:06
  • @RomeoNinov: Well he doesn't say anything about his scale factors. – Joshua Feb 07 '19 at 14:22