1

I've been experiencing a similar problem to (Too many open file handles) when I try to run a program on a Grid Computer. The option of increasing the operating system limit for the total number of open files on this resource is unavailable.

I tried to catch and handle the exception, but catching the exception does not seem to happen. The exception seems to report itself as a FileNotFoundException. One of the places the exception is thrown is in the method shown below:

public static void saveImage(BufferedImage bi, String format, File aFile) {
  try {
    if (bi != null) {
      try {
        //System.out.println("ImageIO.write(BufferedImage,String,File)");
        System.err.println("Not really an error, just a statement to help with debugging");
        ImageIO.write(bi, format, aFile);
      } catch (FileNotFoundException e) {
        System.err.println("Trying to handle " + e.getLocalizedMessage());
        System.err.println("Wait for 2 seconds then trying again to saveImage.");
        //e.printStackTrace(System.err);
        // This can happen because of too many open files.
        // Try waiting for 2 seconds and then repeating...
        try {
          synchronized (bi) {
            bi.wait(2000L);
          }
        } catch (InterruptedException ex) {
          Logger.getLogger(Generic_Visualisation.class.getName()).log(Level.SEVERE, null, ex);
        }
        saveImage(
        bi,
        format,
        aFile);
      } finally {
        // There is nothing to go in here as ImageIO deals with the stream.    
      }
    }
  } catch (IOException e) {
    Generic_Log.logger.log(
    Generic_Log.Generic_DefaultLogLevel, //Level.ALL,
    e.getMessage());
    String methodName = "saveImage(BufferedImage,String,File)";
    System.err.println(e.getMessage());
    System.err.println("Generic_Visualisation." + methodName);
    e.printStackTrace(System.err);
    System.exit(Generic_ErrorAndExceptionHandler.IOException);
  }
}

Here is a snippet from System.err reported one time when the problem occurs:

Not really an error, just a statement to help with debugging   
java.io.FileNotFoundException: /data/scratch/lcg/neiss140/home_cream_292126297/CREAM292126297/genesis/GENESIS_DemographicModel/0_99/0/data/Demographics/0_9999/0_99/39/E02002367/E02002367_Population_Male_End_of_Year_Comparison_2002.PNG (Too many open files) 
  at java.io.RandomAccessFile.open(Native Method)
  at java.io.RandomAccessFile.(RandomAccessFile.java:216) 
  at javax.imageio.stream.FileImageOutputStream.(FileImageOutputStream.java:53) 
  at com.sun.imageio.spi.FileImageOutputStreamSpi.createOutputStreamInstance(FileImageOutputStreamSpi.java:37) 
  at javax.imageio.ImageIO.createImageOutputStream(ImageIO.java:393) 
  at javax.imageio.ImageIO.write(ImageIO.java:1514) 
  at uk.ac.leeds.ccg.andyt.generic.visualisation.Generic_Visualisation.saveImage(Generic_Visualisation.java:90) 
  at uk.ac.leeds.ccg.andyt.generic.visualisation.Generic_Visualisation$ImageSaver.run(Generic_Visualisation.java:210) 
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) 
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
  at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
  at java.lang.Thread.run(Thread.java:662)

I have some ideas for working around this issue, but does anyone know what is wrong?

(I tried to post a version of this question as an answer to this question, but this was deleted by a moderator.)

Community
  • 1
  • 1
  • If your `try` clause is not appearing to catch the exception, it could be because you've referenced the wrong class. Are you definitely catching the `FileNotFoundException` from `java.io`? – Duncan Jones Jan 22 '13 at 14:42
  • Thanks Duncan. I suspected that I should be catching a different exception. I think I can check what exception was making the program choke, so I should do that and report back. Steven has suggested that I should be catching javax.imageio.IIOException, so I am trying that. Thanks again for your suggestion. – Andy Turner Jan 22 '13 at 15:25
  • BTW I was catching java.io.FileNotFoundException thanks @DuncanJones – Andy Turner Jan 28 '13 at 16:28

1 Answers1

0

Firstly, the write method will actually throw an IIOException not a FileNotFoundException if it fails to open the output stream; see the source - line 1532. That explains why your recovery code never runs.

Second, your recovery strategy is a bit dubious. You have no guarantee that whatever is using all of those file handles is going to release them in 2 seconds. Indeed, in the worst case, they may never be released.

But the most important thing is that you are focussing on the wrong part of the problem. Rather than trying to come up with a recovery mechanism, you should focus on the problem of why the application has so many file descriptors open. This smells like a resource leak. I recommend that you run FindBugs over your codebase to see if it can identify the leaky code. Everywhere your code opens an external Stream, it should have a matching close() call in a finally block to ensures that the stream is always closed; e.g.

    OutputStream os = new FileOutputStream(...)
    try {
        // do stuff
    } finally {
        os.close();
    }

or

    // Java 7 form ...
    try (OutputStream os = new FileOutputStream(...)) {
        // do stuff
    }

The resource I am running this on has only 1024 file handlers and changing that is another issue. The program is a simulation and it writes out a large number of output files at the same time as it reads in another lot of input data. That work is threaded using an ExecutorService. The program runs to completion on another computer that has a higher file hander limit, but I want to get it to work on the resource where I am limited to having less file handlers.

So it seems like you are saying that you need to have that number of files open.

It strikes me that the root problem is in your application's architecture. It sounds like you simply have too many simulation tasks running at the same time. I suggest that you reduce the executor's thread pool size to a few less than the max number of open file descriptors.

The problem is that your current strategy could lead to a form of deadlock ... where existing tasks can't make progress until new tasks start running, but the new tasks can't start until existing tasks release file descriptors.

I'm thinking you need a different approach to handling the input and output. Either buffer the complete input and/or output files in memory (etcetera) or implement some kind of multiplexor so that all active files doesn't need to be open at the same time.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Thanks, that is helpful. I will try catching IIOException and see if my handling works. The resource I am running this on has only 1024 file handlers and changing that is another issue. The program is a simulation and it writes out a large number of output files at the same time as it reads in another lot of input data. That work is threaded using an ExecutorService. The program runs to completion on another computer that has a higher file hander limit, but I want to get it to work on the resource where I am limited to having less file handlers. Thanks again. – Andy Turner Jan 22 '13 at 15:11
  • P.S. I am confident I am closing open streams correctly, but thanks for pointing me to FindBugs (http://findbugs.sourceforge.net/) it looks like a useful resource. – Andy Turner Jan 22 '13 at 15:29
  • @StephenC How do you explain the `FileNotFoundException` in the stack trace? – Duncan Jones Jan 22 '13 at 15:30
  • 1
    @DuncanJones - either he's using a different version of Java (that doesn't catch IOException and wrap it) or he's only shown us a partial stacktrace. I suspect the latter. – Stephen C Jan 22 '13 at 22:56
  • As well as catching FileNotFoundException, I also tried catching IIOException and using the same handling. I plan to try again catching also the general IOException too and doing the same handling, but I expect this to fail too. I will confirm java versions and provide links to a complete stack trace shortly. – Andy Turner Jan 23 '13 at 10:07
  • At the point of failure, the simulation program is probably simultaneously writing thousands of output files. It may also be trying to read an input file used in the next stage of the simulation, or it may have completed that and be further on. It will not have completed that stage of simulation and be writing further output. I can write more diagnostic output to be clearer about what is happening at the point of failure. – Andy Turner Jan 23 '13 at 10:25
  • No files written as output using the ExecutorService are opened again for reading. However, there could be a potential for deadlock as the simulation is memory intensive and swaps data between memory and file stores in the implementation requiring file handlers for this swapping. Limiting the ExecutorService pool is a good idea which I'll try next. Thanks again for all your help and feedback on this. – Andy Turner Jan 23 '13 at 10:30
  • I put the full stack trace from the latest run online: http://www.geog.leeds.ac.uk/people/a.turner/personal/blog/archive/2013/01/23/std_0.err – Andy Turner Jan 23 '13 at 13:12
  • There is a mixture of different errors and exceptions reported there. Sorry about that. It could be that the NullPointerException is what makes the program return in error. – Andy Turner Jan 23 '13 at 13:15
  • @Stephen C The Java version information is: java version "1.6.0_37" Java(TM) SE Runtime Environment (build 1.6.0_37-b06) Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode) – Andy Turner Jan 23 '13 at 13:17
  • The simulation writes 154 sets of output. For each set there are 23 images and 28 other types of file. So in total this is 7854 files. Whilst there is no need to have all these files open for writing, it might be slow to write them one after another, but then again it might now. Perhaps this was one of the things @StephenC was getting at. Anyway, I am giving up on creating the image outputs as there is no reliable way to create the BufferredImage with multi-threading on the Grid Computing resource I am using. I am to avoid multi-threading. So will write the other files serially... – Andy Turner Jan 24 '13 at 16:03
  • ... So, effectively I'm running away from this without really knowing what the problem or the answer was. Thanks again to everyone that tried to help. I hope that what is left here is of use to someone. I will try to keep that full stack trace online. – Andy Turner Jan 24 '13 at 16:05