1

I have tens or even hundreds of thousands of files to list. I thought this would be a rather straightforward thing to do, as for example, running find -iname "*.abc" | wc -l runs instantly on my Ubuntu laptop. Unfortunately, the equivalent code in Java based on the good old File API is quite slow.

The reason seems to be that each File instance contains a lot of metadata, while the find command is smart enough to ignore everything not strictly needed in its search.

It seems that NIO 2 has some "new" constructs to make our lives better: we have a new Visitor based API and a DirectoryStream API. But they still seem to lag somehow in comparison with find.

What is the fastest of the fastest approach in Java when all we need is to list (or, let's say to make it simpler at the moment, count) huge quantities of files in a set of folders?

Thanks

devoured elysium
  • 101,373
  • 131
  • 340
  • 557
  • 1
    Well, _find_ will itself be making low level sys-calls, and will definitely be much quicker then Java, as you have the JVM, and lots of overhead compared to the simplicity of find. Can you share some real-world examples? What code have you tried? (maybe you have a defect?) And what is the _actual_ performance difference between the two? – Matt Clark Sep 05 '18 at 18:27
  • 2
    Also, does `find` actually run _instantly_? Can you actually time it? `time find -iname ...` – Matt Clark Sep 05 '18 at 18:28
  • 2
    I find this hard to believe. Show some mcve with some actual numbers. – rustyx Sep 05 '18 at 18:29

1 Answers1

0

Perhaps you could try invoking shell commands using ProcessBuilder. The code below shows how to execute the find command from Java.

public static void main(final String[] args) throws IOException {
    // if running linux:
    runFind();
}

private static void runFind() throws IOException {
    String[] commandList = {"/bin/sh", "-c", "find -iname \"*.txt\" | wc -l"};
    ProcessBuilder processBuilder = new ProcessBuilder(commandList);
    processBuilder.redirectOutput(Redirect.INHERIT); // Redirect output of process
    Process process = processBuilder.start();
}

The code above will only work on Unix devices. The only difference for windows would be your command list:

String[] commandList = { "cmd.exe", "/C", "dir" };

Replace "dir" with whatever the windows equivalent is to the find call you want to make.

If you need your program to be compatible across different operating systems, you could create a branch based on the OS for handling either a windows command, a unix command, and then use the method your currently using for File if you can't for some reason start the process as above.

If you use this method you will have to redirect the output of the process to something you can manipulate in java.

Josh Desmond
  • 640
  • 2
  • 10
  • 19