I am not aware of any pre-existing Java library that completely replicates the functionality of strings
. If you want to consider implementing it yourself, then we can read the Linux man page for strings to get a better idea of the requirements:
For each file given, GNU strings prints the printable character
sequences that are at least 4 characters long (or the number given
with the options below) and are followed by an unprintable character.
Therefore, if you wanted to implement your own solution in pure Java code, then you could read through each byte of the file, check if that byte is printable, and store the sequence of these bytes in a buffer. Then, once you encounter a non-printable character, print the contents of the buffer if the buffer contains at least 4 bytes. For example:
import java.io.BufferedInputStream;
import java.io.ByteArrayOutputStream;
import java.io.FileInputStream;
import java.io.File;
import java.io.IOException;
class Strings {
private static final int MIN_STRING_LENGTH = 4;
public static void main(String[] args) throws IOException {
for (String arg : args) {
File f = new File(arg);
if (!f.exists()) {
System.err.printf("error: no such file or directory: %s%n", arg);
continue;
}
if (!f.canRead()) {
System.err.printf("error: permission denied: %s%n", arg);
continue;
}
if (f.isDirectory()) {
System.err.printf("error: path is directory: %s%n", arg);
continue;
}
try (BufferedInputStream is = new BufferedInputStream(new FileInputStream(f));
ByteArrayOutputStream os = new ByteArrayOutputStream()) {
for (int b = is.read(); b != -1; b = is.read()) {
if (b >= 0x20 && b < 0x7F) {
os.write(b);
} else {
if (os.size() >= MIN_STRING_LENGTH) {
System.out.println(new String(os.toByteArray(), "US-ASCII"));
}
os.reset();
}
}
if (os.size() >= MIN_STRING_LENGTH) {
System.out.println(new String(os.toByteArray(), "US-ASCII"));
}
}
}
}
}
That would cover a basic approximation of the strings
functionality, but there are further details to consider:
By default, it only prints the strings from the initialized and loaded
sections of object files; for other types of files, it prints the
strings from the whole file.
Implementing this part gets more complicated, because you would need to parse and understand the different sections of the binary file format, such as ELF or Windows PE.
An additional complication is character encoding:
-e encoding
--encoding=encoding Select the character encoding of the strings that are to be found. Possible values for encoding are: s =
single-7-bit-byte characters ( ASCII , ISO 8859, etc., default), S =
single-8-bit-byte characters, b = 16-bit bigendian, l = 16-bit
littleendian, B = 32-bit bigendian, L = 32-bit littleendian. Useful
for finding wide character strings. (l and b apply to, for example,
Unicode UTF-16/UCS-2 encodings).
The simpler logic I described above assumed single-byte characters. If you need to identify strings in encodings with multi-byte characters, then the logic will need to be more careful about managing the buffer, checking for printability and checking string length.
There are numerous other arguments that you can pass to strings
, all described in the man page. If you need to fully reproduce all of that functionality, then it will further complicate the logic.
If you prefer not to implement this, then you could fork and execute strings
directly via the ProcessBuilder
class and parse the output. The trade-off is that it introduces an external dependency that your code must run on a platform with strings
installed and incurs some overhead to fork and execute the external process. That trade-off might or might not be acceptable for your application depending on circumstances.