3

strings command prints strings of printable characters in a binary file.

I am curious to know how it works on a high level.

It shouldn't be straightforward since each binary file has a different format, from executables to PDFs and others. So each byte can mean different things from an ASCII/Unicode character to other metadata.

So does it know all those binary file formats? Then, in that case it won't be able to work with some new type or cutsom binary file.

UPDATE: I know what strings command does. I just want to know how it does what it does.

kamalbanga
  • 1,881
  • 5
  • 27
  • 46

2 Answers2

3

strings does not attempt to parse all kinds of files. It scans any file for a long enough sequence of 'printable characters', and when found, shows it. See? No "parsing" involved. (With one exception.)

.. So each byte can mean different things from an ASCII/Unicode character to other metadata.

Only up to a certain point. strings is very straightforward, as it does not attempt to 'parse' for meanings. That is, it does not see the difference between a text string "Hello world" and any random binary sequence that happens to contain the bytes 0x48, 0x65, 0x6C, 0x6C, 0x6F (etc.) in that particular order.

The only allowance it has is you can tell it to (attempt to) interpret the raw bytes as a different character set:

-e encoding
--encoding=encoding
Select the character encoding of the strings that are to be found. Possible values for encoding are: s = single-7-bit-byte characters (ASCII, ISO 8859, etc., default), S = single-8-bit-byte characters, b = 16-bit bigendian, l = 16-bit littleendian, B = 32-bit bigen- dian, L = 32-bit littleendian. Useful for finding wide character strings.

(http://unixhelp.ed.ac.uk/CGI/man-cgi?strings)

and again, then it merely does what you told it to: when told to look for 7-bit ASCII only, it will skip high ASCII characters (even though these may appear in "valid text" inside the binary) and when told 8-bit is okay as well, it shows accented characters as well as random stuff such as ¿, ¼, ¢ and ².


As to parsing, you can infer from the man page there is a single exception:

Do not scan only the initialized and loaded sections of object files; scan the whole files..

where this "object file" is an executable type that your system supports. This may be pure pragmatically: executable binary headers are easily recognized and parsed (an example for "ELF" on SO itself), and mostly one is interested in the text stored in the executable/data part of a binary, and not in the more random bytes in its headers and relocation tables.

Community
  • 1
  • 1
Jongware
  • 22,200
  • 8
  • 54
  • 100
  • Any link to source for more info would be appreciated :) other than the source code of ```strings``` ofcourse. Also it means that it can show some text, that's not actually present in binary file, right? – kamalbanga Aug 26 '14 at 10:59
  • Most of this information comes from the `man` page, but I indeed checked a couple of source codes as well. What do yo mean with "it can show some text, that's not actually present in binary file"? I don't think that has been stated anywhere. – Jongware Aug 26 '14 at 11:31
  • I mean that it can read subsequent bytes as 0x48 0x65 0x6C 0x6C 0x6F which aren't meant to be "Hello" in the source file, still ```strings``` can output that as "Hello". – kamalbanga Aug 26 '14 at 12:11
  • No - determining the *meaning*, that is, what is data and what is not, is not something that `strings` attempts. This is actually a huge gray area; on finding some bytes, you may need to *run* the code to see what they represent. As a borderline case, they may even represent both code *and* data. – Jongware Aug 26 '14 at 12:22
1

For each file given, GNU strings prints the printable character sequences that are at least 4 characters long (or the number given with the options below) and are followed by an unprintable character. By default, it only prints the strings from the initialized and loaded sections of object files; for other types of files, it prints the strings from the whole file.

strings is mainly useful for determining the contents of non-text files.

AkhlD
  • 2,596
  • 2
  • 16
  • 15