Java List Directory With Sorting Performance Issue

Question

I am building a file explorer where am sorting directories and files by name (case insensitive) and ordering directories before files, am using the following code but it is slow in terms of performance, so is there any other way to accomplish this:

File[] directories = new File(path).listFiles(File::isDirectory);
File[] files = new File(path).listFiles(File::isFile);

Arrays.sort(directories, Comparator.comparing(File::getName, String.CASE_INSENSITIVE_ORDER));
Arrays.sort(files, Comparator.comparing(File::getName, String.CASE_INSENSITIVE_ORDER));

File[] list = new File[directories.length + files.length];

System.arraycopy(directories, 0, list, 0, directories.length);  
System.arraycopy(files, 0, list, directories.length, files.length);

Chaosfire · Answer 1 · 2022-10-25T10:04:31.953

First of all, you should do benchmarking to figure out where is the exact bottleneck. How do I write a correct micro-benchmark in Java? should be a good start.

Now, an idea, which might help.

Get all files and directories in a single array. Like this you are getting file and folders with a single operation, instead of two. Accessing file and folders on the disk is not exactly fast operation, so you would want to reduce those. Also if you take a look at the implementation of listFiles(FileFilter), it iterates everything in the folder to find matches, so that's 1 less iteration of all elements.

File[] directoriesAndFiles = folder.listFiles();

As mentioned in other answer isFile, isDirectory are slow, calling them more than once is not desirable. You can define your wrapper class (record in this case) to cache the results.

public record FileWrapper(File file, String name, boolean isDirectory, boolean isFile) {
}

Write composite comparator for the sort. Judging by code you want directories first.

public class FileTypeComparator implements Comparator<FileWrapper> {

    @Override
    public int compare(FileWrapper first, FileWrapper second) {
        if (first.isDirectory() && second.isFile()) {
            return -1;
        }
        if (first.isFile() && second.isDirectory()) {
            return 1;
        }
        return 0;
    }
}

Combine it with your by name comparator:

Comparator<FileWrapper> compositeComparator = new FileTypeComparator()
    .thenComparing(FileWrapper::name, String.CASE_INSENSITIVE_ORDER);

Then build the result array with a stream

File[] result = Arrays.stream(directoriesAndFiles)
        .map(file -> new FileWrapper(file, file.getName(), file.isDirectory(), file.isFile())) //map to wrapper class to cache result of expensive methods
        .filter(fw -> fw.isFile() || fw.isDirectory()) //filter out not needed elements
        .sorted(compositeComparator) //sort with composite comparator
        .map(FileWrapper::file) //extract files from the wrapper
        .toArray(File[]::new); //collect in array

Additionally, to avoid building the initial array you can use Files.list() to get lazily populated stream.

try (Stream<Path> stream = Files.list(Path.of("initial directory"))) {
    File[] result = stream
            .map(Path::toFile)
            .map(file -> new FileWrapper(file, file.getName(), file.isDirectory(), file.isFile()))
            .filter(fw -> fw.isFile() || fw.isDirectory())
            .sorted(compositeComparator)
            .map(FileWrapper::file)
            .toArray(File[]::new);
} catch (IOException exc) {
    //handle
}

The logic gives impression that it has less sort but when I tried it it was even slower. — Akkad, Oct 24 '22 at 12:22
@user2155873 Most probably because of the repeated calls to `isFile` and `isDirectory`. I updated answer with imrpovements with caching. — Chaosfire, Oct 25 '22 at 10:06

DuncG · Answer 2 · 2022-10-25T10:49:03.890

If you have a huge number of files in a folder or are using non-SSD storage the calls to File.listFiles() and File.isDirectory (or isFile) can make the folder scan quite slow. Doing the scan in one step is possible but the sort by dir / file will still repeat the calls to isDirectory/isFile again.

Instead you should look at implementing with Files.find, which means you can read all the file attributes at the same time so that sort does not re-read file system attributes again. It is neatly handled in one stream. Here is an example which just prints the sorted items and modification times of the current folder:

public static Stream<Map.Entry<Path, BasicFileAttributes>>
find(Path dir, int maxDepth, BiPredicate<Path, BasicFileAttributes> matcher, FileVisitOption... options) throws IOException {

    // Using ConcurrentHashMap is safe to use with parallel()
    ConcurrentHashMap<Path,BasicFileAttributes> attrs = new ConcurrentHashMap<>();

    BiPredicate<Path, BasicFileAttributes> predicate = (p,a) -> (matcher == null || matcher.test(p, a)) && attrs.put(p, a) == null;
    return Files.find(dir, maxDepth, predicate, options).map(p -> Map.entry(p, attrs.remove(p)));
}

public static void main(String[] args) throws IOException {
    Path dir = Path.of(args[0]);
    int depth = 1;

    // Note it is easy to add more sort fields here:
    Comparator<Entry<Path, BasicFileAttributes>> compDirs = Comparator.comparing(entry -> entry.getValue().isRegularFile());
    Comparator<Entry<Path, BasicFileAttributes>> comparing = compDirs.thenComparing(entry -> entry.getKey().getFileName().toString(), String.CASE_INSENSITIVE_ORDER);

    try(var files = find(dir, depth, (p,a) -> true)) {
        files.sorted(comparing)
             .forEach(entry 
             -> System.out.println((entry.getValue().isDirectory() ? "DIR  ":"FILE ")+entry.getKey() +" modified "+entry.getValue().lastModifiedTime()));
    }
}

It can be made into tree scan by editing depth to Integer.MAX_VALUE.

As you aren't happy using .forEach() just save the stream as a list and process as regular for loop:

try(var files = find(dir, depth, (p,a) -> true)) {
    for (var entry : files.sorted(comparing).toList()) {
        BasicFileAttributes attr = entry.getValue();
        Path path = entry.getKey();
        System.out.println((attr.isDirectory() ? "DIR  ":"FILE ")+path +" modified "+attr.lastModifiedTime());
    }
}

Hey there. Why do we need `attrs.put(p, a) == null` in the `Files.find` predicate? — gthanop, Oct 16 '22 at 16:05
It emsures all the code is a simple expression predicate which returns true if the path is required and has additional effect of recording the Path+BasicFileAttribute. Otherwise predicate would have to be a much less neat but equivalent code block like `(p,a) -> { boolean b wanted = (matcher == null || matcher.test(p, a)); if(wanted) { attrs.put(p, a); } return wanted;}`. — DuncG, Oct 16 '22 at 16:59
It is too complex for me :) but I am trying it anyway, but how can I use the basic for loop instead of for each as am re-assigning one variable inside the loop and am receiving error regarding FINAL variable in lambda expression ? where I tried to find online but all example are using for each. — Akkad, Oct 24 '22 at 06:57
I just tried it and it didn't provide that much of performance ! — Akkad, Oct 24 '22 at 12:20
@user2155873 Using `find` scan would only help for folders with large numbers of files. If timing anything make sure you don't run it with `System.out` as in the example. — DuncG, Oct 24 '22 at 12:28
i ended by fetching files names first and then getting file attributes in a separate call, managing it user experience wise over performance looks the proper approach so far. — Akkad, Oct 24 '22 at 19:08

score 0 · Answer 3 · answered Oct 28 '22 at 13:15

I went into another direction as am developing for Linux, I used Java to run Linux file access commands using Java Process and it provided a wonderful performance, here is a sample if it helps:

List<String> output = new ArrayList<String>();
try
{
    String cmd = "find /SomePath -maxdepth 1 -mindepth 1 -type d -printf '%f\n' | sort && find /SomePath -maxdepth 1 -not -type d -printf '%f\n' | sort";
    Process p = new ProcessBuilder(cmd).start();
    BufferedReader reader = new BufferedReader(new InputStreamReader(p.getInputStream()));
    String line = "";
    while ((line = reader.readLine()) != null) output.add(line);
    p.waitFor();
}
catch (IOException ex) {} 
catch (InterruptedException ex) {}

Java List Directory With Sorting Performance Issue

3 Answers3