I have a web application where users upload images. We validate the image data via ImageIO.read() and perform a few simple transformations on the resulting BufferedImage before saving it to disk.
While performing load testing, we realized that when many requests come in at the same time, they are being blocked in the ImageIO.read() call. Digging deeper, we noticed that the JPEGImageReader is synchronized and that only one BufferedImage is being created at a time.
Has anyone else come across this? I have been Googling this for a few days now and haven't come across another person that has had this issue, so maybe I am doing something wrong. I cannot come up with any logical reason why this would be. It seems to do with not being able to create individual Readers and Writers per image for some memory leak issue, but that explanation seems rather thin to me.
EDIT: Here is a performance tool that breaks down what is taking so long. I believe this is due to all of the threads waiting for the synchronization lock, JPEGImageReader source.
EDIT: The JAI libraries would have worked except that OpenJDK has removed support for critical parts of it, explicitly the JPEG codec.
SOLUTION: Given the amount of time I spent trying to find an alternative solution and failing to do so, my best solution was to process the images asynchronously, with respect to the requests. So, when a request comes in, the raw image data is stored as a supposedly valid image; then, an asynchronous process outside of the request threads will process each image one at a time. Due to the synchronicity of the ImageIO library, there is no gain from trying to do multiple at once. The images could be processed in parallel given that the library is not synchronous, only inefficient.
While doing the processing asynchronously adds a level of complexity, it's probably a good idea, with respect to modifying the image. What doesn't work is that we cannot process the original image in each request, which means that our system must make the assumption that each image is valid image data. When the asynchronous processor does get around to processing an image, inconsistencies in the system may occur if the data is bad.