2

In my application I watch a directory for new files.

I keep an array of current files obtained with Files.list(dir), process this list one file after another, and then reload the directory with Files.list() again.

While I have never encountered the problem myself, a coworker told me that the predecessor software had an additional check that the file is older than 3 seconds (calculated with (Files.getLastModifiedTime(path) - System.currentTimeMillis()) > 3000) because there were issues that incomplete transferred (or better: not yet fully transferred) files went into processing.

Can I make any assumptions that the files returned by Files.list() were copied fully into the directory I am watching?

Is there a cleaner way to check if a file is complete? The 3 seconds check is more like a hack, a file with multiple GB in size could be copied over a slow connection (network) and may not be fully transfered even after 3s have passed.

a.ilchinger
  • 398
  • 4
  • 13
  • https://stackoverflow.com/questions/1390592/check-if-file-is-already-open?answertab=votes#tab-top ? – user202729 Nov 21 '18 at 15:33
  • I don't see how this answers my question. – a.ilchinger Nov 21 '18 at 15:40
  • Usually the copier program will keep a file open. If the file is not open, then it's likely that the "copy" is finished. I don't know if it's the case for your copy program. – user202729 Nov 21 '18 at 15:42
  • Another approach is to watch files with a particular extension or naming convention. The process creating the file initially names the file something like _file1.in.progress_, and the last step of that process renames the file to something like _file1.dat_. Similarly, another approach is for the creator process to add a 0 byte flag file, for example when _file1.dat_ is complete the process also adds _file1.ready_ as a flag for your process. – Andrew S Nov 21 '18 at 16:01
  • These are all good ideas, and rather simple too. Unfortunately, I cannot control the copying process. There are multiple instances of this software running and the "ingest" is different everywhere and not under my supervision. – a.ilchinger Nov 21 '18 at 17:00
  • [WatchService](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/file/WatchService.html) is specifically designed to watch for changes to a directory. As in, `WatchKey newFileKey = dir.register(dir.getFileSystem().newWatchService(), StandardWatchEventKinds.ENTRY_CREATE)`. – VGR Nov 21 '18 at 17:15

1 Answers1

2

It is not safe to assume that a file is complete if it is in a directory. There could still be a lock associated with another process that could be writing to it or holds the lock for a different reason.

I use the following method to check if a file is ready (this uses java 1.8)

public static boolean isFileReady(Path file) {
  try(FileChannel ch = FileChannel.open(file, StandardOpenOption.WRITE, StandardOpenOption.APPEND); FileLock lock = ch.tryLock()) {
    if (lock == null) return false;
    return true;
  } catch (IOException ex) {
    return false;
  }
}

This will try to open the file for appending (open for normal writing will erase all its content) and create a lock. If the lock is established then we are good to go otherwise we are not.

Somebody
  • 2,667
  • 14
  • 60
  • 100
locus2k
  • 2,802
  • 1
  • 14
  • 21