0

I am writing a function in Android application which can seek a keyword in many large binary files. Currently, I am using "grep" command to check if the keyword exists in each file?

Because, there are many LARGE binary file, so I am facing to time problem. I wish to have your help.

My problem:

1) Seek keywords in many large binary files

2) A better solution than GREP to resolve time problem

Long Uni
  • 101
  • 3
  • 12
  • If you just want to know if the word is there, use `grep` with option `-l` (L). This way, if a match is found it will show the file name and stop searching further matches. – fedorqui Apr 21 '15 at 15:33
  • 1
    @fedorqui I believe that's the default behavior for binary files. LongUni - `grep -E 'word1|word2' file1 file2` will be your fastest option but think about what a "word" means to you and consider whether or not you need some kind of word boundaries so `the` doesn't match `there` (whatever the binary file equivalent is). – Ed Morton Apr 21 '15 at 16:37

1 Answers1

0

When you are searching for a byte sequence in a large file, your bottleneck will be disk bandwidth, then RAM bandwidth. Your goal is to minimize the number of data transfers, ideally to only one - disk to RAM. I would try to map the file into memory piece by piece using FileChannel, then search in the backing array

msh
  • 2,700
  • 1
  • 17
  • 23
  • Thanks msh for your comment. I must test your solution in comparison to mine. I prefer to code in c (Jni), do you have any idea for that? – Long Uni Apr 22 '15 at 00:59
  • same idea, mmap the file, do search (for example, use the Boyer–Moore string search algorithm) – msh Apr 22 '15 at 02:41
  • I must try your advice. There is --mmap option in grep command. Do you think our implementation with that option, which one is faster? – Long Uni Apr 22 '15 at 13:52