0

I am currently developing a Web Service that has to convert PDFs to images, downscale these images and split each scale into different tiles.

For most of our users data the image sizes are not to large and the whole process fits into memory.

But especially when converting big vectorised PDFs to images, resulting in a resolution of 50k+ times 50k resolutions, one BufferedImageinstance easily hits 8GB or more in memory.

Since it is a web server and I would like to handle as many requests as possible in parallel (and even do the scaling and tiling stuff concurrently) - I need some terms of memory controlling.

I know I will probably have to store bigger images to disk between the steps. There are actually some useful open source versions of BufferedImage which can use memory and disk (see BigBufferedImage) ... but there is a performance tradeoff especially for smaller images.

90% of the time I can do everything in memory without a problem. So I would like to ask: How can one calculate the in-memory size of a BufferedImage in advance? I looked into the Javadoc and Googled around quit some time. I'm not really an expert on Image File Formats, Color Models and co. and don't know where to start. Can anybody point me out to things I need to understand to do these calculations, to what accuracy its possible and what else I need to consider?

SmokeDetector
  • 751
  • 2
  • 8
  • 12
SakeSushiBig
  • 1,481
  • 2
  • 14
  • 20
  • Sounds like a good plan. What is the source of these images? What do you know in advance? If you know the width, height and number of bits per pixel, it's pretty straight forward to calculate... – Harald K Dec 21 '16 at 13:09
  • @haraldK So I can get the width and height of the images without actually loading them into memory. I tried a very simple approach by just multiplying `width * height * 4` and compared it with the `byte` array of the loaded `BufferedImage` instances. It worked for a 20MB png (+/- 100 bytes) but when I tried to calculate it for a ~50MB png the number was off by 1GB. – SakeSushiBig Dec 21 '16 at 13:49
  • For compressed image file formats, like PNG og JPEG, there's generally no relation between file size and memory consumption of the decoded image. A `BufferedImage` is always uncompressed in memory, as this is more efficient for imaging operations. Your calculation is only correct for 32 bit RGBA PNGs though. – Harald K Dec 21 '16 at 13:54
  • @haraldK so in-memory size depends on resolution and color space, right? Do you know of any way I can simply calculate the in-memory size when resolution and color space is known in advance? – SakeSushiBig Dec 21 '16 at 14:19
  • Color space is not part of the equation here (consider an indexed image that uses only 4 bit per pixel and a full-color 32 bit image may both use the sRGB color space, but the latter will use 8 times more memory). You only need to know the dimensions and the number of bits per pixel (a few bytes may be wasted for padding, extra references etc, but these are negligible in the big picture). – Harald K Dec 21 '16 at 14:32

1 Answers1

1

Generally the memory required for the pixels of an image, can be computed something like (pseudo code):

memoryNeeded = ceil(width * height * bitsPerPixel / 8.0)

Where 8.0 is the number of bits in a byte, and ceil rounds up to the nearest integer.

For some formats, if the value isn't directly available, you may have to compute bitsPerPixel as follows:

bitsPerPixel = sum(bitsPerSample for each samplePerPixel)

These will not be exact memory requirements for a BufferedImage, as it also contains some references to a Raster, a ColorModel etc., but for large images, this constant will be negligible. The value computed using the formula above should be more than good enough to decide whether or not to allocate the image in memory or on disk.

Harald K
  • 26,314
  • 7
  • 65
  • 111