38

I'd like to screen some jpegs for validity before I send them across the network for more extensive inspection. It is easy enough to check for a valid header and footer, but what is the smallest size (in bytes) a valid jpeg could be?

twk
  • 16,760
  • 23
  • 73
  • 97
  • 3
    libjpeg can do quick tests, consider using it rather than guessing. – Tronic Feb 12 '10 at 16:41
  • 4
    I don't want to add any extra libraries to my app. Also, it isn't guessing if someone tells me the right answer :) – twk Feb 12 '10 at 16:45
  • 1
    You should probably change your question to "test if some jpegs are probably valid" unless you're going to do a bunch of other tests if the file size test passes. Otherwise it is be fairly easy to produce an invalid JPEG of any size over the minimum size of a valid JPEG. – jball Feb 12 '10 at 17:02
  • @jball, good idea -- i've clarified the question. – twk Feb 12 '10 at 18:09

7 Answers7

31

A 1x1 grey pixel in 125 bytes using arithmetic coding, still in the JPEG standard even if most decoders can't decode it:

ff d8 : SOI
ff e0 ; APP0
 00 10
 4a 46 49 46 00 01 01 01 00 48 00 48 00 00
ff db ; DQT
 00 43
 00
 03 02 02 02 02 02 03 02
 02 02 03 03 03 03 04 06
 04 04 04 04 04 08 06 06
 05 06 09 08 0a 0a 09 08
 09 09 0a 0c 0f 0c 0a 0b
 0e 0b 09 09 0d 11 0d 0e
 0f 10 10 11 10 0a 0c 12
 13 12 10 13 0f 10 10 10
ff c9 ; SOF
 00 0b
 08 00 01 00 01 01 01 11 00
ff cc ; DAC
 00 06 00 10 10 05
ff da ; SOS
 00 08
 01 01 00 00 3f 00 d2 cf 20
ff d9 ; EOI

I don't think the mentioned 134 byte example is standard, as it is missing an EOI. All decoders will handle this but the standard says it should end with one.

That file can be generated with:

#!/usr/bin/env bash
printf '\xff\xd8' # SOI
printf '\xff\xe0' # APP0
printf  '\x00\x10'
printf  '\x4a\x46\x49\x46\x00\x01\x01\x01\x00\x48\x00\x48\x00\x00'
printf '\xff\xdb' # DQT
printf  '\x00\x43'
printf  '\x00'
printf  '\x03\x02\x02\x02\x02\x02\x03\x02'
printf  '\x02\x02\x03\x03\x03\x03\x04\x06'
printf  '\x04\x04\x04\x04\x04\x08\x06\x06'
printf  '\x05\x06\x09\x08\x0a\x0a\x09\x08'
printf  '\x09\x09\x0a\x0c\x0f\x0c\x0a\x0b'
printf  '\x0e\x0b\x09\x09\x0d\x11\x0d\x0e'
printf  '\x0f\x10\x10\x11\x10\x0a\x0c\x12'
printf  '\x13\x12\x10\x13\x0f\x10\x10\x10'
printf '\xff\xc9' # SOF
printf  '\x00\x0b'
printf  '\x08\x00\x01\x00\x01\x01\x01\x11\x00'
printf '\xff\xcc' # DAC
printf  '\x00\x06\x00\x10\x10\x05'
printf '\xff\xda' # SOS
printf  '\x00\x08'
printf  '\x01\x01\x00\x00\x3f\x00\xd2\xcf\x20'
printf '\xff\xd9' # EOI

and opened fine with GNOME Image Viewer 3.38.0 and GIMP 2.10.18 on Ubuntu 20.10.

Alternative way of generating this image:
echo ffd8ffe000104a46494600010101004800480000ffdb004300030202020202030202020303030304060404040404080606050609080a0a090809090a0c0f0c0a0b0e0b09090d110d0e0f101011100a0c12131210130f101010ffc9000b080001000101011100ffcc000600101005ffda0008010100003f00d2cf20ffd9 | xxd -r -p > small.jpg

Here's an upload on Imgur. Note that Imgur process the file making it larger however if you download it to check, and as seen below, the width=100 image shows white on Chromium 87:

StackzOfZtuff
  • 2,534
  • 1
  • 28
  • 25
matja
  • 4,014
  • 3
  • 23
  • 29
  • 4
    Which of these bytes are safe to increment to produce a series of small but different JPEGs? – Quolonel Questions Jul 14 '13 at 19:07
  • @Quolonel Questions - The 8x8 'square' of bytes in the DQT segment are essentially scaling factors, any of which can be values 1-255. I think the only value which is used in the DAC segment of this example is the first one at the upper-left of the 8x8 block. – matja Jan 27 '15 at 18:57
  • It doesn't work for windows, why? – Eugene W. Feb 28 '23 at 08:05
13

It occurs to me you could make a progressive jpeg with only the DC coefficients, that a single grey pixel could be encoded in 119 bytes. This reads just fine in a few programs I've tried it in (Photoshop, GNOME Image Viewer 3.38.0, GIMP 2.10.18, and others).

ff d8 : SOI
ff db ; DQT
 00 43
 00
 01 01 01 01 01 01 01 01
 01 01 01 01 01 01 01 01
 01 01 01 01 01 01 01 01
 01 01 01 01 01 01 01 01
 01 01 01 01 01 01 01 01
 01 01 01 01 01 01 01 01
 01 01 01 01 01 01 01 01
 01 01 01 01 01 01 01 01
ff c2 ; SOF
 00 0b
 08 00 01 00 01 01 01 11 00
ff c4 ; DHT
 00 14
 00
 01 00 00 00 00 00 00 00
 00 00 00 00 00 00 00 00
 03
ff da ; SOS
 00 08
 01 01 00 00 00 01 3F
ff d9 ; EOI

The main space savings is to only have one Huffman table. Although this is slightly smaller than the 125 byte arithmetic encoding given in another answer, the arithmetic encoding without the JFIF header would be smaller yet (107 bytes), so that should still be considered the smallest known.

The above file can be generated with:

#!/usr/bin/env bash
printf '\xff\xd8' # SOI
printf '\xff\xdb' # DQT
printf  '\x00\x43'
printf  '\x00'
printf  '\x01\x01\x01\x01\x01\x01\x01\x01'
printf  '\x01\x01\x01\x01\x01\x01\x01\x01'
printf  '\x01\x01\x01\x01\x01\x01\x01\x01'
printf  '\x01\x01\x01\x01\x01\x01\x01\x01'
printf  '\x01\x01\x01\x01\x01\x01\x01\x01'
printf  '\x01\x01\x01\x01\x01\x01\x01\x01'
printf  '\x01\x01\x01\x01\x01\x01\x01\x01'
printf  '\x01\x01\x01\x01\x01\x01\x01\x01'
printf '\xff\xc2' # SOF
printf  '\x00\x0b'
printf  '\x08\x00\x01\x00\x01\x01\x01\x11\x00'
printf '\xff\xc4' # DHT
printf  '\x00\x14'
printf  '\x00'
printf  '\x01\x00\x00\x00\x00\x00\x00\x00'
printf  '\x00\x00\x00\x00\x00\x00\x00\x00'
printf  '\x03'
printf '\xff\xda' # SOS
printf  '\x00\x08'
printf  '\x01\x01\x00\x00\x00\x01\x3F'
printf '\xff\xd9' # EOI
Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
garlon4
  • 1,162
  • 10
  • 14
  • For the curious, when trying to read this with iOS' `[UIImage imageWithData:]` it outputs: `ImageIO: JPEG Corrupt JPEG data: 2 extraneous bytes before marker 0xda`. – Ricardo Sanchez-Saez Jul 31 '14 at 17:59
  • Or as a data url data:image/jpeg,%ff%d8%ff%db%00%43%00%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%01%ff%c2F%00%0b%08%00%01%00%01%01%01%11%00%ff%c4%00%14%00%01%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%03%ff%da%00%08%01%01%00%00%00%01%3F%ff%d9 – LeBleu Jun 08 '22 at 14:42
9

Try the following (134 bytes):

FF D8 FF E0 00 10 4A 46 49 46 00 01 01 01 00 48 00 48 00 00
FF DB 00 43 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
FF FF FF FF FF FF FF FF FF FF C2 00 0B 08 00 01 00 01 01 01
11 00 FF C4 00 14 10 01 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 FF DA 00 08 01 01 00 01 3F 10

Source: Worlds Smallest, Valid JPEG? by Jesse_hz

kenorb
  • 155,785
  • 88
  • 678
  • 743
7

Found "the tiniest GIF ever" with only 26 bytes.

47 49 46 38 39 61 01 00 01 00 
00 ff 00 2c 00 00 00 00 01 00 
01 00 00 02 00 3b

Python literal:

b'GIF89a\x01\x00\x01\x00\x00\xff\x00,\x00\x00\x00\x00\x01\x00\x01\x00\x00\x02\x00;'
Henrique Bastos
  • 809
  • 8
  • 8
3

While I realize this is far from the smallest valid jpeg and has little or nothing to do with your actual question, I felt I should share this as I'd been looking for a very small JPEG that actually looked like something to do some testing with when i'd found your question. I'm sharing it here because its valid, its small, and it makes me ROFL.

Here is a 384 byte JPEG image that I made in photoshop. It is the letters ROFL hand drawn by me and then saved with max compression settings while still being sort of readable.

Hex sequences:

my @image_hex = qw{
 FF D8 FF E0 00 10 4A 46 49 46 00 01 02 00 00 64
 00 64 00 00 FF EC 00 11 44 75 63 6B 79 00 01 00
 04 00 00 00 00 00 00 FF EE 00 0E 41 64 6F 62 65
 00 64 C0 00 00 00 01 FF DB 00 84 00 1B 1A 1A 29
 1D 29 41 26 26 41 42 2F 2F 2F 42 47 3F 3E 3E 3F
 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47
 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47
 47 47 47 47 47 47 47 47 47 47 47 47 01 1D 29 29
 34 26 34 3F 28 28 3F 47 3F 35 3F 47 47 47 47 47
 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47
 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47
 47 47 47 47 47 47 47 47 47 47 47 47 47 FF C0 00
 11 08 00 08 00 19 03 01 22 00 02 11 01 03 11 01
 FF C4 00 61 00 01 01 01 01 00 00 00 00 00 00 00
 00 00 00 00 00 00 04 02 05 01 01 01 01 00 00 00
 00 00 00 00 00 00 00 00 00 00 00 02 04 10 00 02
 02 02 02 03 01 00 00 00 00 00 00 00 00 00 01 02
 11 03 00 41 21 12 F0 13 04 31 11 00 01 04 03 00
 00 00 00 00 00 00 00 00 00 00 00 00 21 31 61 71
 B1 12 22 FF DA 00 0C 03 01 00 02 11 03 11 00 3F
 00 A1 7E 6B AD 4E B6 4B 30 EA E0 19 82 39 91 3A
 6E 63 5F 99 8A 68 B6 E3 EA 70 08 A8 00 55 98 EE
 48 22 37 1C 63 19 AF A5 68 B8 05 24 9A 7E 99 F5
 B3 22 20 55 EA 27 CD 8C EB 4E 31 91 9D 41 FF D9
}; #this is a very tiny jpeg. it is a image representaion of the letters "ROFL" hand drawn by me in photoshop and then saved at the lowest possible quality settings where the letters could still be made out :)

my $image_data = pack('H2' x scalar(@image_hex), @image_hex);
my $url_escaped_image = uri_escape( $image_data );

URL escaped binary image data (can paste right into a URL)

%FF%D8%FF%E0%00%10JFIF%00%01%02%00%00d%00d%00%00%FF%EC%00%11Ducky%00%01%00%04%00%00%00%00%00%00%FF%EE%00%0EAdobe%00d%C0%00%00%00%01%FF%DB%00%84%00%1B%1A%1A)%1D)A%26%26AB%2F%2F%2FBG%3F%3E%3E%3FGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG%01%1D))4%264%3F((%3FG%3F5%3FGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG%FF%C0%00%11%08%00%08%00%19%03%01%22%00%02%11%01%03%11%01%FF%C4%00a%00%01%01%01%01%00%00%00%00%00%00%00%00%00%00%00%00%00%04%02%05%01%01%01%01%00%00%00%00%00%00%00%00%00%00%00%00%00%00%02%04%10%00%02%02%02%02%03%01%00%00%00%00%00%00%00%00%00%01%02%11%03%00A!%12%F0%13%041%11%00%01%04%03%00%00%00%00%00%00%00%00%00%00%00%00%00!1aq%B1%12%22%FF%DA%00%0C%03%01%00%02%11%03%11%00%3F%00%A1~k%ADN%B6K0%EA%E0%19%829%91%3Anc_%99%8Ah%B6%E3%EAp%08%A8%00U%98%EEH%227%1Cc%19%AF%A5h%B8%05%24%9A~%99%F5%B3%22%20U%EA'%CD%8C%EBN1%91%9DA%FF%D9
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
1

Here's the C++ routine I wrote to do this:

bool is_jpeg(const unsigned char* img_data, size_t size)
{           
    return img_data &&
           (size >= 10) &&
           (img_data[0] == 0xFF) &&
           (img_data[1] == 0xD8) &&
           ((memcmp(img_data + 6, "JFIF", 4) == 0) ||
            (memcmp(img_data + 6, "Exif", 4) == 0));
}

img_data points to a buffer containing the JPEG data.

I'm sure you need more bytes to have a JPEG that will decode to a useful image, but it's a fair bet that if the first 10 bytes pass this test, the buffer probably contains a JPEG.

EDIT: You can, of course, replace the 10 above with a higher value once you decide on one. 134, as suggested in another answer, for example.

Warren Young
  • 40,875
  • 8
  • 85
  • 101
0

It is not a requirement that JPEGs contain either a JFIF or Exif marker. But they must start with FF D8, and they must have a marker following that, so you can check for FF D8 FF.

jsam
  • 9
  • 1