6

... or should I go deeper into the data stream looking for 0xFF 0xD8 sequence?

From this Q, I've learned what APPn does not have to follow SOI immediately. Are there specification compliant JPEG cases where SOI position != beginning of the stream?


A quote from the specification (Annex B, § 1.1.2):

Markers serve to identify the various structural parts of the compressed data formats. Most markers start marker segments containing a related group of parameters; some markers stand alone. All markers are assigned two-byte codes: an X’FF’ byte followed by a byte which is not equal to 0 or X’FF’ (see Table B.1). Any marker may optionally be preceded by any number of fill bytes, which are bytes assigned code X’FF’.

Community
  • 1
  • 1
Free Consulting
  • 4,300
  • 1
  • 29
  • 50

1 Answers1

5

libjpeg does not allow garbage before the SOI:

/* Like next_marker, but used to obtain the initial SOI marker. */
/* For this marker, we do not allow preceding garbage or fill; otherwise,
* we might well scan an entire input file before realizing it ain't JPEG.
* If an application wants to process non-JFIF files, it must seek to the
* SOI before calling the JPEG library.
*/

From: Random libjpeg mirror.

E.g. the go implementation also does not allow preceding garbage.

However, if in doubt, stick to Postel's Law:

Be liberal in what you accept, and conservative in what you send

Although, you don't want to be too liberal, or you might end up extracting not the actual JPEG from the stream but the embedded EXIF thumbnail or something like that.

nmaier
  • 32,336
  • 5
  • 63
  • 78
  • 1
    It is pretty logical if random garbage bytes at the beginning invalidates the whole stream, but I was talking about padding bytes, which seems to be allowed by the spec (see the edit). – Free Consulting Sep 03 '13 at 03:17
  • 1
    Yeah, the spec might say so, but many implementations including the reference implementation handle SOI differently (I gave libjpeg, go as examples). However, fill bytes (and sometimes even random garbage) preceding the marker are gladly accepted for all other markers in most implementations I encountered... up to a certain point at least. – nmaier Sep 03 '13 at 04:31