0

I am having an issue loading a large amount of images via the ForkJoinPool, I am testing on a 4 core Intel with hyper-theading so 8 logical threads. However, I limit the Pool to only 4 Threads. And I receive errors from ImageIO not being able to find the image.

public class LoadImages extends RecursiveAction {
private static final long serialVersionUID = 1L;

//this is an example
private static int threadThreshold = totalImages/totalThreads + 2;

private String[] imgArr;
private int arrStart = 0;
private int arrSize = 0;

public LoadImages(String[] imgs, int start, int size) {
    imgArr = imgs;
    arrSize = size;
    arrStart = start;
}

protected void processImages(){
    BufferedImage img = null;
    for (int i = arrStart; i < arrStart + arrSize; i++) {
        try{
            img = ImageIO.read(new File(imgArr[i]));    
        } catch (IOException | CMMException | NullPointerException e) {
            System.out.println(imgArr[i]);
            e.printStackTrace();
            img = null;
        }

        ...

    }
}

protected void compute() {
    // Check the number of files
    if (arrSize <= threadThreshold) {
        processImages();
        return;
    } else {

        int split = arrSize / 2;

        invokeAll(new LoadImages(imgArr, arrStart, split), new LoadImages(imgArr, arrStart + split, arrSize - split));
    }

}
}

Any insight on what I am doing wrong would be great, I notice it really only breaks if I have over 1700+ images and all the images are 5MB and over.

Here is the error I am receiving from Java:

javax.imageio.IIOException: Can't create an ImageInputStream!
at javax.imageio.ImageIO.read(Unknown Source)

When I know the file is there. I used this code as a guide: https://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html

java_joe
  • 5
  • 3
  • Your first mistake is in compute() with invokeall(). You should be using fork() for the first LoadImages, and compute() for the second LoadImages. Look at the JavaDoc for ForkJoinPool. The invokeAll() waits for ALL invoked tasks to complete before moving on. – edharned Dec 11 '14 at 23:07
  • @edharned I see your point, and I tried that, actually started there. However if I do loadImg1.fork(); loadImg2.compute(); loadImg1.join(); it does do the same thing am I wrong? Look at the example I followed: https://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html Either way I still get the same error. – java_joe Dec 12 '14 at 03:05
  • For better speed yet, try setting `ImageIO.useCache(false)`. Also, I think you should update the question with the `IOException` mentioned in the comments to one of the answers below. This is most probably the cause. Maybe you are hitting a limit to how many open files your process can have or similar? – Harald K Dec 12 '14 at 08:46
  • I added the IOException to the titlevand I will also try to set the useCache to false, see if that help anything. – java_joe Dec 12 '14 at 14:30
  • I still think using a Queue with a thread pool would be easier (producer/consumer.) Using FJ for async processing has limitations. You could just calc the number of Tasks you need up front and fork() that many since you don't wait for completion. The error you're getting is probably what @Wes Cumberland said below. It seems you're mixing concurrency with parallelism. – edharned Dec 12 '14 at 15:01
  • @edharned Implementing a Queue with a thread pool sounds like a great idea. However, I feel that I would still get the same IO issues. I am not quite sure how I would implement a thread pool to work off a queue, could you provide an example, so I can run some tests and see if the results are different and if there is any speed change. – java_joe Dec 12 '14 at 16:03
  • There are many examples of producer/consumer queues/thread pool both here and on the internet. FJ adds a lot of overhead since it is designed for sync processing (fork then join.) You may still have a problem with: ImageIO.read(new File(imgArr[i])); in that it may not be thread safe as pointed out by @Wes Cumberland. – edharned Dec 12 '14 at 19:42

3 Answers3

0

Seems kind of random. My guess is it could just be a hardware or OS error. Assuming this is a scaling issue, my advise with your 1700+ images is that you'd probably be better off setting this up on the cloud somewhere - could save a lot of time and headaches

ControlAltDel
  • 33,923
  • 10
  • 53
  • 80
  • I would agree that it could definitely be a hardware/io harddrive issue. However, I have tried putting a while loop around the try/catch and I get an endless loop. Seems that once it fails to access the file just once it fails every time... but if i close the program down and re-run it a second time everything works... so weird! But if I wait a day an re-run I get the same issue, and then it works the second time. However, if I place the images on NAS such as a Thecus (current backup storage). I can get it to fail every time no matter how many times I run the program – java_joe Dec 11 '14 at 20:44
0

It seems to me an ImageIO error when internally it creates the ImageInputStream. Did you try to read the images with an ImageInputStream? Like:

InputStream is = new FileInputStream("path");
ImageInputStream iis = ImageIO.createImageInputStream(is);
BufferedImage bufImage = ImageIO.read(iis);
  • 1
    Just tried that still seem to be a related issue: java.io.FileNotFoundException: C:\jpg\IMG_7734.jpg (The system cannot open the file) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(Unknown Source) at java.io.FileInputStream.(Unknown Source) and the file is definitely there – java_joe Dec 11 '14 at 20:54
  • Did you tried to synchronize the reading? Is it allways to same files or random? – Govinda S. R. Dec 12 '14 at 19:05
  • I am not sure I understand what you mean, The array is built in a different class from walkFileTree. What would I put the synchronize one? the InputStream? – java_joe Dec 12 '14 at 23:11
-1

If you go inspect the source for ImageIO.read(File) and ImageIO.read(ImageInputStream) you can see that see that ImageIO reuses instances of ImageReader, and this article says that ImageReader is not thread-safe. You'll probably have to create your own ImageReaders for use in the separate threads.

Also you should measure how much this multithreading-IO strategy really gains you. If you're trying to pull gigs of image data off of a spinning hard-drive your process will probably be I/O bound and parallelizing the loading won't give you much.

Wes Cumberland
  • 1,318
  • 8
  • 12
  • It actually saves a heck of a lot of time. since SSD's never seem to fail (only spinning disk drive, which make me think it could be a hardware issue). Loading about 5gb of image with 4 thread takes truly about a 1/3 - 1/4 of the time. I also feel that it is not the ImageIO cause the issue because if I use an InputStream it still fails I will try to issue a new reader in every thread to see if that solves the issue. Thanks! – java_joe Dec 11 '14 at 20:51
  • 1
    No dice here, seems like a related issue. However, in reading this thread: http://stackoverflow.com/a/26300361/3015634 the conclusion seems to be that Java's main concern for ImageIO is not thread-safety, but yet it seems that it still is. – java_joe Dec 12 '14 at 03:52
  • 1
    The thing about `ImageReader`s not being thread safe, does not mean you can't use them in a multithreaded environment. Unfortunately, this is a common misconception.. What it does mean is that you can't share a single instance of an `ImageReader` between multiple threads. Using `ImageIO.read()` from multiple threads is safe. – Harald K Dec 12 '14 at 08:40