41

I'm generating binary data files that are simply a series of records concatenated together. Each record consists of a (binary) header followed by binary data. Within the binary header is an ascii string 80 characters long. Somewhere along the way, my process of writing the files got a little messed up and I'm trying to debug this problem by inspecting how long each record actually is.

This seems extremely related, but I don't understand perl, so I haven't been able to get the accepted answer there to work. The other answer points to bgrep which I've compiled, but it wants me to feed it a hex string and I'd rather just have a tool where I can give it the ascii string and it will find it in the binary data, print the string and the byte offset where it was found.

In other words, I'm looking for some tool which acts like this:

tool foobar filename

or

tool foobar < filename

and its output is something like this:

foobar:10
foobar:410
foobar:810
foobar:1210
...

e.g. the string which matched and a byte offset in the file where the match started. In this example case, I can infer that each record is 400 bytes long.

Other constraints:

  • ability to search by regex is cool, but I don't need it for this problem
  • My binary files are big (3.5Gb), so I'd like to avoid reading the whole file into memory if possible.
Community
  • 1
  • 1
mgilson
  • 300,191
  • 65
  • 633
  • 696
  • Argv! I don't know at what point my mastery of English grammar slid into the mire. Thanks for fixing that for me @Kevin – mgilson Jan 03 '13 at 15:24

3 Answers3

47
grep --byte-offset --only-matching --text foobar filename

The --byte-offset option prints the offset of each matching line.

The --only-matching option makes it print offset for each matching instance instead of each matching line.

The --text option makes grep treat the binary file as a text file.

You can shorten it to:

grep -oba foobar filename

It works in the GNU version of grep, which comes with linux by default. It won't work in BSD grep (which comes with Mac by default).

Hari Menon
  • 33,649
  • 14
  • 85
  • 108
  • 1
    I tried this, all it says is: `Binary file filename matches`. My system is Ubuntu Linux, and `grep --version` gives: "GNU grep 2.5.2" – mgilson Jan 03 '13 at 15:02
  • 3
    Try adding the `-a` option to treat binary files as text – Hari Menon Jan 03 '13 at 15:05
  • 3
    It *could* work in OS X grep if you prefix the grep with `LC_CTYPE=C `; however, recent (and maybe not so recent) OS X has grep 2.5.1, and that has a a bug in it which always outputs 0 as the byte offset. – Ivan X Jan 31 '16 at 12:55
  • 2
    I'd suggest using `grep -F` if you just need to find a known string, as it has a lot less overhead. – Hitechcomputergeek May 25 '16 at 22:50
30

You could use strings for this:

strings -a -t x filename | grep foobar

Tested with GNU binutils.

For example, where in /bin/ls does --help occur:

strings -a -t x /bin/ls | grep -- --help

Output:

14938 Try `%s --help' for more information.
162f0       --help     display this help and exit
Thor
  • 45,082
  • 11
  • 119
  • 130
  • 5
    I ended up using `strings -a -t d filename | grep foobar` to write the output in decimal instead of hex. Otherwise, great answer that seems like it will work with different flavors of `grep`. – mgilson Jan 03 '13 at 15:58
  • 2
    `grep -oba` (see Hari Menon's answer) is much faster, but using `strings` allows you to do partial matching. Which answer is better depends on your use-case! – Luc Sep 06 '18 at 04:45
1

I wanted to do the same task. Though strings | grep worked, I found gsar was the very tool I needed.

http://tjaberg.com/

The output looks like:

>gsar.exe -bic -sfoobar filename.bin
filename.bin: 0x34b5: AAA foobar BBB
filename.bin: 0x56a0: foobar DDD
filename.bin: 2 matches found
Thor
  • 45,082
  • 11
  • 119
  • 130
caesun
  • 19
  • 3