0

I've to read a file at a given location in HDFS and do further operations. I am using FileSystem API to watch the location:

FileSystem.listStatus(workingDir)

My problem is a growing file i.e. for example a 200GB file being dropped at that location. This code above returns the file name/path but the file is still not fully copied. Is there a way to find out if the file is fully copied using Java API? I have read this and few other blogs/questions but haven't found what I've been looking for.

user123
  • 281
  • 1
  • 3
  • 16
  • If in fact you want to wait until the file is fully copied, you could compare the filesizes from the src to the final location. Once they match, then start processing. The better solution would be to stream the file imo – nLee Apr 03 '18 at 20:27
  • @nLee - Thank you for your response. I don't control the source and I am the one responsible for streaming downstream... – user123 Apr 03 '18 at 22:08
  • Have you looked at the inotify interface – shaine Apr 03 '18 at 22:28
  • @shainnif - I've but cant use it. Interestingly, that was the first thing I looked at. – user123 Apr 04 '18 at 04:32

1 Answers1

0

For now, this is what I am doing and it works. Length could've been used too but it wasn't reliable in my testing.

FileSystem fileSystem = FileSystem.newInstance(workingDir.toUri(), fsConfig);
FileStatus[] fileStatuses = FileSystem.listStatus(workingDir);
  for(FileStatus fileStatus : fileStatuses){
    if(fileStatus.isFile()){
       final Path filePath = fileStatus.getPath();
       long modificationTime = fileStatus.getModificationTime();
       Thread.sleep(4000);
       long modTimeAfterSleep = fileStatus.getModificationTime();
     if(modTimeAfterSleep - modificationTime  == 0){
         System.out.println("File fully copied");
     } else {
       System.out.println("Keep fishing..");
     }
}
user123
  • 281
  • 1
  • 3
  • 16