8

I have a few different JPEG images I've been testing with. As far as I've seen the 0th and first bytes are always 0xFF and 0xD8.

The second and third are usually either 0xFF and 0xE0 ( APP0 ) indicating either a JFIF segment or JFIF extension segment or 0xFF and 0xE1 ( APP1 ) indicating an EXIF segment.

My question: is this always the case? Are the 2nd and 3rd bytes always APP0 or APP1?

hippietrail
  • 15,848
  • 18
  • 99
  • 158
KeatsKelleher
  • 10,015
  • 4
  • 45
  • 52
  • Thanks Andy... That was my roundabout way of bragging that this is being done in JavaScript ;) – KeatsKelleher Mar 23 '11 at 23:49
  • heh, no worries - I've used *FileSystemObject* to create a zip file from the first few bytes before. Props for creating a whole JPEG file :-) – Andy E Mar 24 '11 at 10:14

8 Answers8

11

No. There are e.g. several cameras that create JPEGs without these markers, or with other APP markers. The only thing you can rely on is the SOI sequence, FF D8, not even EOI is produced by all cameras. Also be aware that JPEGs with embedded JPEGs exist - you can have nested SOI/EOI within an image.

If you need to deal with embedded JPEG data in raw camera images, several models produce JPEG-like data that can only be parsed by being a bit slack with the jpeg spec - especially in relation to escaped FF bytes in data. And then you have cameras that produce proprietary data that at first glance looks like jpeg data (e.g. some of Sony's "encrypted" raw formats)

Erik
  • 88,732
  • 13
  • 198
  • 189
  • 3
    Side note: More than 99% of my sample JPEGs (from various cameras) *do* have APP0/APP1 - the exceptions I mention are not common, and you don't really need to deal with them in a generic parser – Erik Mar 23 '11 at 23:57
  • 3
    I don't really see why you would want to hard-code into your parser to expect that data, when you could just program it to read one segment at a time and deal with it based on its header. – mgiuca Mar 24 '11 at 00:06
3

Things are complicated here. Since I am currently writing a javascript file identifier, I'll try to answer with my javascript object for JPEG. Especially because the question had a "javascript" tag.

The basic answer is already given (the accepted one) but this is more detailed about how to check the different App markers (with fallback).

There are special APP0s so far for JFIF, EXIF, Adobe, Canon and Samsung (but we don't know about the future). So the logic for the js object is:

If one of the SPECS[x].regex is matched it wins (first one wins). But if nothing is matched, the parent object (only FFd8) wins.

The SPECS object delivers according PRONOM identifiers - you can view them like so

'http://apps.nationalarchives.gov.uk/pronom/fmt/'.concat(PUID) [official] 'http://apps.nationalarchives.gov.uk/pronom/x-fmt/'.concat(xPUID) [experimental]

_FFD8: { 
    SPECS: [
        { 
            PUID: 112, 
            regex: /^FFD8FFE8(.{2})53504946460001/, 
            desc: 'jpeg: Still Picture Interchange Format file (SPIF)',
            regexCapture: [
                { key: 'recordedSignature' }, 
                { key: 'segmentLength', fn: function(h){ return { value:parseInt(h, 16), _val:h.toString() }; } }
            ], 
            valueCapture: {
                version: '1.00'
            }
        }, 
        { 
            PUID: 44, 
            regex: /^FFD8FFE0(.{2})4A464946000102/, 
            desc: 'jpeg: JPEG File Interchange Format file (JFIF), v. 1.02',
            regexCapture: [
                { key: 'recordedSignature' }, 
                { key: 'segmentLength', fn: function(h){ return { value:parseInt(h, 16), _val:h.toString() }; } }
            ], 
            valueCapture: {
                version: '1.02',
            }
        },
        { 
            PUID: 43, 
            regex: /^FFD8FFE0(.{2})4A464946000101/, 
            desc: 'jpeg: JPEG File Interchange Format file (JFIF), v. 1.01',
            regexCapture: [
                { key: 'recordedSignature' }, 
                { key: 'segmentLength', fn: function(h){ return { value:parseInt(h, 16), _val:h.toString() }; } }
            ], 
            valueCapture: {
                version: '1.01',
            }
        },
        { 
            PUID: 42, 
            regex: /^FFD8FFE0(.{2})4A464946000100/, 
            desc: 'jpeg: JPEG File Interchange Format file (JFIF), v. 1.00',
            regexCapture: [
                { key: 'recordedSignature' }, 
                { key: 'segmentLength', fn: function(h){ return { value:parseInt(h, 16), _val:h.toString() }; } }
            ], 
            valueCapture: {
                version: '1.00',
            }
        },          
        { 
            PUID: 41, 
            xPUID: 398,
            regex: /^FFD8FFE1(.{2})45786966000049492A00(.+)009007000400000030323030/, 
            desc: 'jpeg: JPG Image File, using Exchangeable Image File Format (Exif), little endian, v. 2.0',
            regexCapture: [
                { key: 'recordedSignature' }, 
                { key: 'segmentLength', fn: function(h){ return { value:parseInt(h, 16), _val:h.toString() }; } }
            ], 
            valueCapture: {
                endian: 'little',
                version: '2.0',
            }
        }, 
        { 
            PUID: 41, 
            xPUID: 398, 
            regex: /^FFD8FFE1(.{2})4578696600004D4D002A(.+)900000070000000430323030/, 
            desc: 'jpeg: JPG Image File, using Exchangeable Image File Format (Exif), big endian, v. 2.0',
            regexCapture: [
                { key: 'recordedSignature' }, 
                { key: 'segmentLength', fn: function(h){ return { value:parseInt(h, 16), _val:h.toString() }; } }
            ], 
            valueCapture: {
                endian: 'big',
                version: '2.0',
            }
        },              
        { 
            PUID: 41, 
            xPUID: 390,
            regex: /^FFD8FFE1(.{2})45786966000049492A00(.+)009007000400000030323130/, 
            desc: 'jpeg: JPG Image File, using Exchangeable Image File Format (Exif), little endian, v. 2.1',
            regexCapture: [
                { key: 'recordedSignature' }, 
                { key: 'segmentLength', fn: function(h){ return { value:parseInt(h, 16), _val:h.toString() }; } }
            ], 
            valueCapture: {
                endian: 'little',
                version: '2.1',
            }
        }, 
        { 
            PUID: 41, 
            xPUID: 390, 
            regex: /^FFD8FFE1(.{2})4578696600004D4D002A(.+)900000070000000430323130/, 
            desc: 'jpeg: JPG Image File, using Exchangeable Image File Format (Exif), big endian, v. 2.1',
            regexCapture: [
                { key: 'recordedSignature' }, 
                { key: 'segmentLength', fn: function(h){ return { value:parseInt(h, 16), _val:h.toString() }; } }
            ], 
            valueCapture: {
                endian: 'big',
                version: '2.1',
            }
        },          
        { 
            PUID: 41, 
            xPUID: 391,
            regex: /^FFD8FFE1(.{2})45786966000049492A00(.+)009007000400000030323230/, 
            desc: 'jpeg: JPG Image File, using Exchangeable Image File Format (Exif), little endian, v. 2.2',
            regexCapture: [
                { key: 'recordedSignature' }, 
                { key: 'segmentLength', fn: function(h){ return { value:parseInt(h, 16), _val:h.toString() }; } }
            ], 
            valueCapture: {
                endian: 'little',
                version: '2.2',
            }
        }, 
        { 
            PUID: 41, 
            xPUID: 391, 
            regex: /^FFD8FFE1(.{2})4578696600004D4D002A(.+)900000070000000430323230/, 
            desc: 'jpeg: JPG Image File, using Exchangeable Image File Format (Exif), big endian, v. 2.2',
            regexCapture: [
                { key: 'recordedSignature' }, 
                { key: 'segmentLength', fn: function(h){ return { value:parseInt(h, 16), _val:h.toString() }; } }
            ], 
            valueCapture: {
                endian: 'big',
                version: '2.2',
            }
        }, 
        // specific JPEG (all begin with FFD8FF, map them to PUID 41)
        { 
            PUID: 41, 
            regex: /^FFD8FFED/, 
            desc: 'jpeg: JPG Image File, Adobe JPEG, Photoshop CMYK buffer'
        }, 
        { 
            PUID: 41, 
            regex: /^FFD8FFE2/, 
            desc: 'jpeg: JPG Image File, Canon JPEG, Canon EOS-1D'
        }, 
        { 
            PUID: 41, 
            regex: /^FFD8FFE3/, 
            desc: 'jpeg: JPG Image File, Samsung JPEG, e.g. Samsung D500'
        }, 
        { 
            PUID: 41, 
            regex: /^FFD8FFDB/, 
            desc: 'jpeg: JPG Image File, Samsung JPEG, e.g. Samsung D807'
        }
    ],
    ext: ['JPG', 'JPE', 'JPEG', 'SPF', 'SPIFF'],
    signature: [ 255, 216 ],
    desc: 'jpeg: JPEG File Interchange Format file, App0 marker not known',
    mime: 'image/jpeg',
    specifications: [
        { text:'Specification for the JFIF file format', href:'http://www.w3.org/Graphics/JPEG/jfif3.pdf', type:'W3', format:'pdf' },
        { text:'The JPEG compression specification', href:'http://www.w3.org/Graphics/JPEG/itu-t81.pdf', type:'W3', format:'pdf' },
        { text:'Exchangeable image file format for digital still cameras', href:'http://home.jeita.or.jp/tsc/std-pdf/CP3451C.pdf', type:'vendor', format:'pdf' }
    ], 
    references: [
        { text:'JPEG JFIF W3 Info', href:'http://www.w3.org/Graphics/JPEG/', type:'W3', format:'html' },
        { text:'JPEG.org', href:'http://www.jpeg.org/', type:'info', format:'html' },
        { text:'JPEG Exif App markers', href:'http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/JPEG.html', type:'info', format:'html'}
    ]
}
sebilasse
  • 4,278
  • 2
  • 37
  • 36
  • Good answer. The whole file signature/magic-byte topic is really vague. Where did you get that json list? I don't see it on the links you posted. – Andy B Feb 26 '18 at 17:19
2

Those first two bytes are the JPEG SOI marker so are always present.

The 2nd and 3rd bytes appear to store meta data, which probably isn't present in every JPEG.

Further Reading.

alex
  • 479,566
  • 201
  • 878
  • 984
  • Right, the SOI marker will always be there... This question was about the APP0 and APP1 markers, though. In every case I've seen they come next. I'm wondering if this is necessarily the case or if it's just usually the case. – KeatsKelleher Mar 23 '11 at 23:46
  • @akellehe I've updated my answer. What images were you trying? All from one set or from a number of sources? – alex Mar 23 '11 at 23:46
  • They're from a number of different sources verified to use different versions of the Exif standard ( JFIF too ). I got them here: http://exif.org/samples.html – KeatsKelleher Mar 23 '11 at 23:51
2

No, it certainly doesn't have to be that way. Reading Wikipedia.

As far as I can tell, the APPn segments are just ways for applications to embed arbitrary data into the image file. Obviously, applications commonly take advantage of this and write 0xFF 0xEO or 0xFF 0xE1 bytes into the header, but it would be entirely plausible for an application to not do this and just go on with the image data. The first two bytes (0xFF and 0xD8) are mandatory, as they are the SOI (start-of-image) marker.

mgiuca
  • 20,958
  • 7
  • 54
  • 70
1

No. There are programs that remove some markers. ImageOptim is such a program. You only need some of the markers. This program will also optimise the huffman tables

1

Typically yes, however my understanding of JPEG is that any segment type can follow the header.

fbrereto
  • 35,429
  • 19
  • 126
  • 178
1

For a JFIF file, a JFIF header should follow immediately after FFD8. The JFIF header is contained within an APP0 marker. The specification doesn't say anything about padding though.

Without the JFIF header we can only guess what color format is used.

onemasse
  • 6,514
  • 8
  • 32
  • 37
1

In theory, yes. According to the JFIF spec (pdf), its APP0 section should come first in the file. And the Exif spec (pdf) requires the same for its APP1 section.

But you shouldn't count on the order (or even the existance of) APPn sections; there are crazy JPEG writers out there. Start with the SOI and read the sections as they come by.

Ozgur Ozcitak
  • 10,409
  • 8
  • 46
  • 56