Reading image header info without loading the entire image

Question

I have a .net 3.5 application that will be dealing with a large number of images. I need to check that the image extension is correct, the image height and width, and the PPI. I do not want to load the entire image into a .net image or bitmap, this will take to long and be to resource intensive. I can not use third party plug-ins or dlls, and of course it needs to be done yesterday.

So, I am reading the initial bytes of the files, checking the "magic" numbers to make sure the image extension matches, and then the height and width of the image for most of the image types I need to handle. This is much faster and less resource intensive. I could use a little help reading the PPI from some of the image types, and two of the types have just stumped me beyond validating the extension so far.

BMP, JPG, GIF, and PNG I need help reading the PPI.

Looking for something like located at offset xx.

TIF, EPS, and PSD I need help reading height, width, and PPI.

I am pretty much stuck on Eps and Psd files, and anything would help.
Yes I know about tiflib, it looks great, and way more than I need. A lighter version that handles only the height, width, and PPI would be great. If I have to I can do this, but I'm hoping someone all ready has :-)

To determine PPI, it won't be as easy as "offset XX" for all formats. For example PNG pixel dimension is optional (http://www.libpng.org/pub/png/book/chapter11.html#png.ch11.div.8), and is located in a chunk, among other chunks. You'll have to read chunks until you find the pHYs one (if it exist). Other formats don't even store pixel dimensions, or the pixel dimension may be incorrect (but unused by application so it still works). — Simon Mourier, Jan 22 '13 at 08:38
PPI is also dependent on the output display. The file formats that do use PPI typically are for reference to the original output device. — Adam Zuckerman, Jan 22 '13 at 15:33
EPS and PSD are compound storage documents. You won't be able to find what you are looking for at a specific offset. Each image contained may be either a bitmap or vector graphic. There also may be a large number of images in either file type. — Adam Zuckerman, Jan 22 '13 at 15:36
@SimonMourier, This will be an automated filter before a person actually looks at the picture, and we are trying to cut down the number of "bad" photos that make it to that person. If the data is incorrect or doesn't exist, we will catch it in the next step. Looping through the chunks till I either find what I'm looking for or I hit the image data is what I'll have to do for PNGs it looks like. — Loscas, Jan 29 '13 at 20:30
@AdamZuckerman, PPI is used in a few different contexts, and often mistakenly interchanged with DPI. In this case I am looking at photos, and I am using the resolution and the pixel density(PPI) to evaluate the quality of the image as it was taken. http://www.elizabethhalford.com/editing/pixels-and-dots-the-game/ http://www.andrewdaceyphotography.com/articles/dpi/ — Loscas, Jan 29 '13 at 20:38
Also, it looks like I wont be able to do any validation with the EPS or PSD files. — Loscas, Jan 29 '13 at 20:41

score 4 · Answer 1 · answered Jan 29 '13 at 22:45

All byte locations assume the first byte is in position 1, not 0.

PNG files Width: bytes 9-12, Height: bytes 13-16, PPI: look for a 4 byte signature of 112 72 89 115 (decimal values), bytes 1-4 (following) contain the X pixels per unit, bytes 5-8 contain the Y pixels per unit, byte 9 contains the unit specifier (0=unknown, 1=meter). The PPI is stored in an optional chunk and may not exist in all PNGs.

http://www.libpng.org/pub/png/spec/iso/index-object.htm or http://en.wikipedia.org/wiki/PNG_file_format

BMP files Width: bytes 18-21, Height: bytes 22-25, PPI: bytes 38-41 contain the X pixels per meter, bytes 42-45 contain the Y pixels per meter.

http://en.wikipedia.org/wiki/BMP_file_format

JPG files JPEG refers to the compression, while JFIF is the actual file storage format. Width: , Height: , PPI: bytes 11-12 contain the X pixels per unit, bytes 13-14 contain the Y pixels per unit. Byte 10 contains the unit (0=no units, 1=pixels per inch, 2= pixels per cm).

http://en.wikipedia.org/wiki/JPEG_File_Interchange_Format and http://www.ecma-international.org/publications/files/ECMA-TR/TR-098.pdf

GIF files Width: bytes 7-8, Height: bytes 9-10, PPI: GIF files do not contain any pixel density information.

http://en.wikipedia.org/wiki/Graphics_Interchange_Format

I have supplied links to the other formats as they require specific knowledge of the format to determine if or where the information you requested is stored.

http://partners.adobe.com/public/developer/tiff/index.html

http://en.wikipedia.org/wiki/Portable_Document_Format and http://www.adobe.com/devnet/pdf/pdf_reference_archive.html

http://www.adobe.com/devnet-apps/photoshop/fileformatashtml/

In my own experiment the PNG Width was at 16 (decimal) and Height was at 20. — Sten Petrov, Dec 26 '15 at 05:29

score -1 · Answer 2 · edited May 23 '17 at 12:23

-1

Rather than spending hundreds of hours of development time writing and debugging your own multi-format image parser, I would suggest that you look at ways to optimize existing methods. While some image formats are easy, others are hard. Some are really hard. As was mentioned, some "formats" are just containers for other formats.

Here are some suggestions:

Speed up loading an image from disk in a windows forms (c#.net) app

http://www.vcskicks.com/fast-image-processing.php

How can I find the pixel per inch value in a JPG image?

edited May 23 '17 at 12:23

Community

1
1

answered Jan 28 '13 at 22:41

lfalin

4,219
5
31
57

1

As I mentioned in the original post, loading the entire image to retrieve these three pieces of information about the image is resource intensive. Why load an entire 10 or 25Mb file when you only need to read a few bytes? We have resource and performance concerns, and taking the time to build this is worth it to my client. – Loscas Jan 29 '13 at 19:48

Reading image header info without loading the entire image

2 Answers2

Linked