-1

In bash I can do

strings someBinaryfile.exe

for an .exe (or .dll, or .so) and that will print just the human-readable portion of the binaries.

Is there a similar library in java. I know how to open a file and print it, but I just need the human-readable portion.

Inian
  • 80,270
  • 14
  • 142
  • 161
RonPringadi
  • 1,294
  • 1
  • 19
  • 44
  • Thanks for formatting the question @Inian – RonPringadi Jan 20 '17 at 18:10
  • 1
    `strings` is not a `bash` command; it's a standalone program that you could run from a Java program just as well as from a `bash` script. – chepner Jan 20 '17 at 18:31
  • I understand @chepner - that aside, is there a library in java that serve similar purpose? For example apache commons (although I cannot find a similar function in apache commons library) – RonPringadi Jan 20 '17 at 19:05

1 Answers1

3

I am not aware of any pre-existing Java library that completely replicates the functionality of strings. If you want to consider implementing it yourself, then we can read the Linux man page for strings to get a better idea of the requirements:

For each file given, GNU strings prints the printable character sequences that are at least 4 characters long (or the number given with the options below) and are followed by an unprintable character.

Therefore, if you wanted to implement your own solution in pure Java code, then you could read through each byte of the file, check if that byte is printable, and store the sequence of these bytes in a buffer. Then, once you encounter a non-printable character, print the contents of the buffer if the buffer contains at least 4 bytes. For example:

import java.io.BufferedInputStream;
import java.io.ByteArrayOutputStream;
import java.io.FileInputStream;
import java.io.File;
import java.io.IOException;

class Strings {

    private static final int MIN_STRING_LENGTH = 4;

    public static void main(String[] args) throws IOException {
        for (String arg : args) {
            File f = new File(arg);
            if (!f.exists()) {
                System.err.printf("error: no such file or directory: %s%n", arg);
                continue;
            }
            if (!f.canRead()) {
                System.err.printf("error: permission denied: %s%n", arg);
                continue;
            }
            if (f.isDirectory()) {
                System.err.printf("error: path is directory: %s%n", arg);
                continue;
            }
            try (BufferedInputStream is = new BufferedInputStream(new FileInputStream(f));
                        ByteArrayOutputStream os = new ByteArrayOutputStream()) {
                for (int b = is.read(); b != -1; b = is.read()) {
                    if (b >= 0x20 && b < 0x7F) {
                        os.write(b);
                    } else {
                        if (os.size() >= MIN_STRING_LENGTH) {
                            System.out.println(new String(os.toByteArray(), "US-ASCII"));
                        }
                        os.reset();
                    }
                }
                if (os.size() >= MIN_STRING_LENGTH) {
                    System.out.println(new String(os.toByteArray(), "US-ASCII"));
                }
            }
        }
    }
}

That would cover a basic approximation of the strings functionality, but there are further details to consider:

By default, it only prints the strings from the initialized and loaded sections of object files; for other types of files, it prints the strings from the whole file.

Implementing this part gets more complicated, because you would need to parse and understand the different sections of the binary file format, such as ELF or Windows PE.

An additional complication is character encoding:

-e encoding --encoding=encoding Select the character encoding of the strings that are to be found. Possible values for encoding are: s = single-7-bit-byte characters ( ASCII , ISO 8859, etc., default), S = single-8-bit-byte characters, b = 16-bit bigendian, l = 16-bit littleendian, B = 32-bit bigendian, L = 32-bit littleendian. Useful for finding wide character strings. (l and b apply to, for example, Unicode UTF-16/UCS-2 encodings).

The simpler logic I described above assumed single-byte characters. If you need to identify strings in encodings with multi-byte characters, then the logic will need to be more careful about managing the buffer, checking for printability and checking string length.

There are numerous other arguments that you can pass to strings, all described in the man page. If you need to fully reproduce all of that functionality, then it will further complicate the logic.

If you prefer not to implement this, then you could fork and execute strings directly via the ProcessBuilder class and parse the output. The trade-off is that it introduces an external dependency that your code must run on a platform with strings installed and incurs some overhead to fork and execute the external process. That trade-off might or might not be acceptable for your application depending on circumstances.

Chris Nauroth
  • 9,614
  • 1
  • 35
  • 39