34

I am wondering how File.exists() works. I'm not very aware of how filesystems work, so I should maybe start reading there first.

But for a quick pre information:

Is a call to File.exists() a single action for the filesystem, if that path and filename are registered in some journal? Or does the OS get the content of the directory and then scan through it for matches?

I presume this will be filesystem dependent, but maybe all filesystems use the quick approach?

I'm not talking about network and tape systems. Lets keep it to ntfs, extX, zfs, jfs :-)

Jens
  • 69,818
  • 15
  • 125
  • 179
Franz Kafka
  • 10,623
  • 20
  • 93
  • 149
  • Journaling filesystems are totally different from others. – Dhaivat Pandya Jun 12 '11 at 09:21
  • 12
    This is going to be *very* file system dependant. When you're accessing a file on an NFS or SMB file share, it might involve establishing a network connection. If the underlying disc is powered off, you'll have to wait for it to spin up (even worse if it's an optical drive). Heck, there are situations (such as hierarchical storage) where it might involve loading a tape and literally take minutes! – Sven Jun 12 '11 at 09:24

4 Answers4

16

Measure the necessary time and see yourself. As you say it is absolutely file system dependent.

        long t1 = System.currentTimeMillis();
        ...Your File.exists call
        long t2 = System.currentTimeMillis();
        System.out.println("time: " + (t2 - t1) + " ms");

You will see that it will always give you different results, since it depends also on the way your OS caches data, on its load etc.

Costis Aivalis
  • 13,680
  • 3
  • 46
  • 47
  • 2
    It almost entirely depends on whether a disk access is required, in which case your disk speed (and whether it is busy) will be what matters. If you are testing when the file is in cache, you are better off with System.nanoTime() as it will typically be sub-milli-second. – Peter Lawrey Jun 12 '11 at 09:44
  • System.currentTimeMillis() doesn't necessarily even give you millisecond-resolution timing. :/ – Karl Knechtel Jun 12 '11 at 09:49
  • NanoTime is definitely preciser! Still currentTimeMillis may give you the large idea too. One could also insert the system load factor into the equation before measuring, but the results will be always approximate. Remember: you can't observe an experiment without somehow altering it... – Costis Aivalis Jun 12 '11 at 10:09
  • It is not entirely up to the FS implementation but rather to the caching policy for filesystems in an OS. You can get faster responses if the directory is cached and the file or the directory has been recently used. – Daniel Voina Jun 12 '11 at 10:50
  • @Costis currentTimeMillis is about accurate to about 15ms on a Windows platform, which really isn't that great for something that may take far less time. – Voo Jun 12 '11 at 12:04
  • 3
    Why do you autobox `long` to `Long` only to convert it back to a `long` during subtraction? – Steve Kuo Jun 12 '11 at 18:24
  • Thank you Steve! It should be long! I've changed it. – Costis Aivalis Jun 12 '11 at 18:33
  • @Costis, nanoTime is weird, it has been discussed multiple times why so (TSC http://en.wikipedia.org/wiki/Time_Stamp_Counter values differ in mutli-socket systems), so it takes a hypervisor to intercept it and so on. If there is none, you may end up w/ what you get from `System.currentTimeMillis()`, so it's OS/hardware dependent. – bestsss Jun 16 '11 at 07:35
  • Last I checked currentTimeMillis only ever changed in chunks of about 10-15 milliseconds. As that is approximately the time of a thread runs before a context switch I would guess it is updated every context switch. Regardless it's no good for anything that is fast. I'm not familiar with nanoTime, though hooking into queryperformancecounter is definitely a great way to time things. –  Oct 18 '13 at 22:20
15

How this operation if performed the first time is entirely dependant on the filesystem. This is done by the OS and Java doesn't play any part.

In terms of performance, a read to a disk is required in all cases. This typically takes 8-12 ms. @Sven points out some storage could slower, but this relatively rare in cases where performance is important. You may have an additional delay if this is a network file system (usually relatively small but it depends on your network latency).

Everything else the OS and Java does is very short by comparison.

However, if you check the file exists repeatedly, a Disk access may not be required as the information can cached, in this case the time the OS takes and resources. One of the largest of these the objects File.exists() creates (you wouldn't think it would) however it encodes the file's name on every call creating a lot of objects. If you put File.exists() in a tight loop it can create 400MB of garbage per second. :(

Journaling filesystems work differently by keeping track of all the changes you make to a file system, however they don't change how you read the filesystem.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
8

Most of the file-related operations are not performed in Java; native code exists to perform these activities. In reality, most of the work done depends on the nature of the FileSystem object (that is backing the File object) and the underlying implementation of the native IO operations in the OS.

I'll present the case of the implementation in OpenJDK 6, for clarity. The File.exists() implementation defers the actual checks to the FileSystem class:

public boolean exists() {
    ... calls to SecurityManager have been omitted for brevity ...
    return ((fs.getBooleanAttributes(this) & FileSystem.BA_EXISTS) != 0);
}

The FileSystem class is abstract, and an implementation exists for all supported filesystems:

package java.io;


/**
 * Package-private abstract class for the local filesystem abstraction.
 */

abstract class FileSystem

Notice the package private nature. A Java Runtime Environment, will provide concrete classes that extend the FileSystem class. In the OpenJDK implementation, there are:

  • java.io.WinNTFileSystem, for NTFS
  • java.io.Win32FileSystem, for FAT32
  • java.io.UnixFileSystem, for *nix filesystems (this is a class with a very broad responsibility).

All of the above classes delegate to native code, for the getBooleanAttributes method. This implies that performance is not constrained by the managed (Java) code in this case; the implementation of the file system, and the nature of the native calls being made have a greater bearing on performance.

Update #2

Based on the updated question -

I'm not talking about network and tape systems. Lets keep it to ntfs, extX, zfs, jfs

Well, that still doesn't matter. Different operating systems will implement support for different file systems in different ways. For example, NTFS support in Windows will be different from the one in *nix, because the operating system will also have to do it's share of bookkeeping, in addition to communicating with devices via their drivers; not all the work is done in the device.

In Windows, you will almost always find the concept of a file system filter drivers that manages the task of communicating with other file system filter drivers or the file system. This is necessary to support various operations; one example would be the use of filter drivers for anti-virus engines and other software (on-the-fly encryption and compression products) intercepting IO calls.

In *nix, you will have the stat(), system call that will perform the necessary activity of reading the inode information for the file descriptor.

Vineet Reynolds
  • 76,006
  • 17
  • 150
  • 174
1

It's super fast on any modern machine, my tests show 0.0028 millis (2.8 microseconds) on my 2013 Mac w/SSD

1,000 files created in 307 millis, 0.0307 millis per file

1,000 .exists() done in 28 millis, 0.0028 millis per file

Here's a test in Groovy (Java)

def index() {
    File fileWrite

    long start = System.currentTimeMillis()

    (1..1000).each {
        fileWrite = new File("/tmp/fileSpeedTest/${it}.txt")
        fileWrite.write('Some nice text')
    }
    long diff = System.currentTimeMillis() - start
    println "1,000 files created in $diff millis, ${diff/10000.0} millis per file"



    start = System.currentTimeMillis()
    (1..1000).each {
        fileWrite = new File("/tmp/fileSpeedTest/${it}.txt")
        if ( ! fileWrite.exists() )
            throw new Exception("where's the file")
    }
    diff = System.currentTimeMillis() - start
    println "1,000 .exists()   done in  $diff millis, ${diff/10000.0} millis per file"

}
Travis May
  • 71
  • 1
  • 3