26

I'd like to extract thumbnail image from jpegs, without any external library. I mean this is not too difficult, because I need to know where the thumbnail starts, and ends in the file, and simply cut it. I study many documentation ( ie.: http://www.media.mit.edu/pia/Research/deepview/exif.html ), and try to analyze jpegs, but not everything clear. I tried to track step by step the bytes, but in the deep I confused. Is there any good documentation, or readable source code to extract the info about thumbnail start and end position within a jpeg file?

Thank you!

BitBank
  • 8,500
  • 3
  • 28
  • 46
Lay András
  • 795
  • 2
  • 9
  • 14
  • 2
    There are at least 3 places which could store thumbnails for JPEG images: JFIF/APP0, EXIF APP1, and ADEOBE APP13. Here http://javagraphics.blogspot.ca/2010/03/images-reading-jpeg-thumbnails.html is a blog about it and you may also find this https://github.com/dragon66/icafe/wiki useful. – dragon66 Oct 06 '14 at 21:04

4 Answers4

33

Exiftool is very capable of doing this quickly and easily:

exiftool -b -ThumbnailImage my_image.jpg > my_thumbnail.jpg
Riot
  • 15,723
  • 4
  • 60
  • 67
21

For most JPEG images created by phones or digital cameras, the thumbnail image (if present) is stored in the APP1 marker (FFE1). Inside this marker segment is a TIFF file containing the EXIF information for the main image and the optional thumbnail image stored as a JPEG compressed image. The TIFF file usually contains two "pages" where the first page is the EXIF info and the second page is the thumbnail stored in the "old" TIFF type 6 format. Type 6 format is when a JPEG file is just stored as-is inside of a TIFF wrapper. If you want the simplest possible code to extract the thumbnail as a JFIF, you will need to do the following steps:

  1. Familiarize yourself with JFIF and TIFF markers/tags. JFIF markers consist of two bytes: 0xFF followed by the marker type (0xE1 for APP1). These two bytes are followed by the two-byte length stored in big-endian order. For TIFF files, consult the Adobe TIFF 6.0 reference.
  2. Search your JPEG file for the APP1 (FFE1) EXIF marker. There may be multiple APP1 markers and there may be multiple markers before the APP1.
  3. The APP1 marker you're looking for contains the letters "EXIF" immediately after the length field.
  4. Look for "II" or "MM" (6 bytes away from length) to indicate the endianness used in the TIFF file. II = Intel = little endian, MM = Motorola = big endian.
  5. Skip through the first page's tags to find the second IFD where the image is stored. In the second "page", look for the two TIFF tags which point to the JPEG data. Tag 0x201 has the offset of the JPEG data (relative to the II/MM) and tag 0x202 has the length in bytes.
Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
BitBank
  • 8,500
  • 3
  • 28
  • 46
  • 4
    Might also point out that there can be more than one reduced resolution image in the Exif data. For example in Nikon JPEG files, there is a thumbnail and a second (larger) preview image. The only restriction is that the total Exif data cannot be more than 64,000 bytes. Another point -- the Exif data can be little endian or big endian as you say. However, the JPEG markers and data and the thumbnail data are always big endian. Markers like 0xFFE1 (APP1 marker) are defined by the JPEG Standard ISO DIS 10918-1 and is available on-line. – Jim Merkel Apr 30 '12 at 19:09
7

There is a much simpler solution for this problem, but I don't know how reliable it is: Start reading the JPEG file from the third byte and search for FFD8 (start of JPEG image marker), then for FFD9 (end of JPEG image marker). Extract it and voila, that's your thumbnail.

A simple JavaScript implementation:

function getThumbnail(file, callback) {
    if (file.type == "image/jpeg") {
        var reader = new FileReader();
        reader.onload = function (e) {
            var array = new Uint8Array(e.target.result),
                start, end;
            for (var i = 2; i < array.length; i++) {
                if (array[i] == 0xFF) {
                    if (!start) {
                        if (array[i + 1] == 0xD8) {
                            start = i;
                        }
                    } else {
                        if (array[i + 1] == 0xD9) {
                            end = i;
                            break;
                        }
                    }
                }
            }
            if (start && end) {
                callback(new Blob([array.subarray(start, end)], {type:"image/jpeg"}));
            } else {
                // TODO scale with canvas
            }
        }
        reader.readAsArrayBuffer(file.slice(0, 50000));
    } else if (file.type.indexOf("image/") === 0) {
        // TODO scale with canvas
    }
}
Joel
  • 15,496
  • 7
  • 52
  • 40
  • 1
    Nice simple code for a proof of concept, but this breaks for about 1/20 photos I have, because I don't think you can guarantee that 0xFFD8 doesn't appear elsewhere. – Gordon Williams Mar 16 '16 at 08:57
  • shouldn't end be i+1 to include the 0xD9? – Mattis Dec 17 '18 at 01:50
  • This code will sometimes produce partially rendered thumbnails. To fix it change "end = i" to "end = i + 2". Also the file.slice should be increased from 50000 to a higher value, since maximum exif data size is 64k. Demo: https://vitali-fedulov.github.io/similar.pictures/jpeg-thumbnail-reader.html – Similar pictures Jun 13 '22 at 19:07
-1

The wikipedia page on JFIF at http://en.wikipedia.org/wiki/JPEG_File_Interchange_Format gives a good description of the JPEG Header(the header contains the thumbnail as an uncompressed raster image). That should give you an idea of the layout and thus the code needed to extract the info.

Hexdump of an image header (little endian display):

sdk@AndroidDev:~$ head -c 48 stfu.jpg |hexdump
0000000 d8ff e0ff 1000 464a 4649 0100 0101 4800
0000010 4800 0000 e1ff 1600 7845 6669 0000 4d4d
0000020 2a00 0000 0800 0000 0000 0000 feff 1700

Image Magic (bytes 1,0), App0 Segment header Magic(bytes 3,2), Header Length (5,4) Header Type signature ("JFIF\0"||"JFXX\0")(bytes 6-10), Version (bytes 11,12) Density units (byte 13), X Density (bytes 15,14), Y Density (bytes 17,16), Thumbnail width (byte 19), Thumbnail height (byte 18), and finally rest up to "Header Length" is thumbnail data.

From the above example, you can see that the header length is 16 bytes (bytes 6,5) and version is 01.01 (bytes 12,13). Further, as Thumbnail Width and Thumbnail Height are both 0x00, the image doesn't contain a thumbnail.

Samveen
  • 3,482
  • 35
  • 52
  • 1
    Your analysis of the JFIF header is incorrect. JPEG files usually contain JPEG compressed thumbnail images. The thumbnail width and height are stored in the APP1 marker as part of a TIFF file. You can see in your dump at offset 0x1E the start of the TIFF header "II" followed by version 0x2a and IFD offset 0x0008. – BitBank Apr 27 '12 at 18:00
  • My analysis is based on the info found on http://en.wikipedia.org/wiki/JPEG_File_Interchange_Format as well as the jpeg standard http://www.ecma-international.org/publications/files/ECMA-TR/TR-098.pdf Section 10 (page 5). Please elaborate more on your source of information. You're probably talking of the JFIF extension (JFXX) segment format, while the above example is of the JFIF segment format (bytes offset 0x06-0x10 are "JFIF\0") – Samveen Apr 27 '12 at 19:58
  • 2
    The thumbnail info may be in the spec, but that's not how it's used in the real world. I have never seen a JPEG image with the thumbnail in the APP0 header. It is stored (usually compressed) in the EXIF (APP1) header as part of a TIFF file which contains the other EXIF info as TIFF tags. Post the file you reference above and I'll tell you what's in it. – BitBank Apr 27 '12 at 20:29
  • As dragon66 says in another comment, [there's at least three places in a jpeg where a thumbnail may be](https://stackoverflow.com/questions/10349622/extract-thumbnail-from-jpeg-file#comment41131041_10349622). JFIF (APP0) thumbnails are rare but they exist. EXIF (APP1) and Photoshop (APP13) thumbnails are more common. – hippietrail Aug 17 '23 at 22:24