2

This is a question, out of curiosity, about some patterns I see in JPG files when I look at them in a hex editor. I guess it is a question about the JPEG file format; why not this part is "random noise" like the rest, when it is supposed to be (Huffman coding and so on).

Here goes:

This 136-bit (17 bytes) pattern is showing up in some JPG files that are produced by Adobe Photoshop (I do not know if Photoshop is the only application that produces these):

F7 5E EB DE FD D7 BA F7 BF 75 EE BD EF DD 7B AF 7B

It is several places in one single file, sometimes it is just one iteration, other times it is repeated like 8 or 12 times, making up blocks of 1088 bits or 1632 bits blocks. Or to be precise, it is actually a 68-bit pattern, repeated 2 or more times:

F7 5E EB DE FD D7 BA F7 B

11110111010111101110101111011110111111011101011110111010111101111011

AFAIK from reading a bit about the JPG file structure, and also verifying this in hex, that the beginning of JPG file structures are marked with FF xx. There are no such FF xx structure markers neither immediately before nor after those 68-bit patterns.

By using Breakpoint Hex Workshop, it is very easy to spot those patterns in the "Data Visualizer" window; while the rest of Huffman bitstream looks like "noise", there are suddenly blocks showing clear patterns.

Also.. I am not sure how relevant this is, but..:

Earlier, I noticed such a type of patterns also in CR2 files, that is Canon RAW files; here the pattern was a much simpler 40-bit one, though:

73 9C E7 39 CE

0111 0011 1001 1100 1110 0111 0011 1001 1100 1110

If I adjust the spaces, it becomes this:

01110 01110 01110 01110 01110 01110 01110 01110

As you can see, this is actually a repeating 5-bit pattern, and it was repeated like several hundred times for each place it appeared in the CR2 files. The CR2 file format is also a compressed file, but lossless. Then again, the Huffman coding in JPG is also a kind of lossless "compression" if I have understood it correctly.

I find it very strange that in compressed streams, there are these patterns of (what to me seems to be) "wasted" bits..

I have uploaded one of the JPG files here https://i.stack.imgur.com/8dmjj.jpg - it's just a simple screenshot of some files in a folder. The Huffman code bitstream goes from offset 0x0000027C to the end, and you may see one of the instances of the repeating pattern e.g. at offset 0x0001604A

ForguesR
  • 3,558
  • 1
  • 17
  • 39
  • Where in the JPEG stream are these patterns occurring? What marker? – user3344003 Nov 02 '14 at 05:06
  • @user3344003: They are spread out a lot of places in the Huffman coding bitstream. I.e. they all appear somewhere between FFDA and the FFD9 at the end. In the image I uploaded which contains them, you can see the FFDA is located at offset 0x0000027C. The first instance of the pattern comes at 0x0000264A, the next at 0x00003AAB, etc etc. At 0x0001604A there are a whole bunch of them one after the other. PowerGrep tells me it finds 75 matches of the pattern, spread out over the whole area of the Huffman coding bitstream. – HackeyStack Nov 02 '14 at 05:30
  • have you tried decoding DCT co-efficients from these values? – Jimmy Nov 02 '14 at 14:29
  • The image you posted has large areas of white. The AC coefficients for each solid white MCU are going to encode the same way. Where you have black on white, the Cb and Cr components MCUs are likely to be the same. – user3344003 Nov 02 '14 at 14:35

2 Answers2

0

Correct me if i'm wrong but i'm thinking this could be some 'blueprint' for checking if photoshop has been used. Maybe all of this is piracy related

J88
  • 811
  • 7
  • 20
  • 1
    In this image here i.imgur.com/XBPelZd.png you may see the obvious patterns in the Data Visualizer tab on the left side. I was also first thinking about "hidden tags" - but then, why make them **this** obvious if they are meant to be hidden? You could be right, of course. This second screenshot shows a hex view and data visualization of a 5.84 MB JPG file, which is called Tourist-map-of-Europolis.jpg, and is from the Dreamfall Chapters Special Edition which I bought from Steam a couple of days ago. It was there I first discovered that pattern in JPG. Then I found other files also.. – HackeyStack Nov 02 '14 at 06:46
  • Yeah like i said , i could be wrong but i once heard somewhere that applications like Dreamweaver put them in their HTML files as well. Otherwise imo it would be like impossible to see if something has been made in a WYSIWYG-app or for example Notepad. – J88 Nov 02 '14 at 08:35
0

User3344003, thank you very, very much for your answer, it is 99.9% correct..! :-)

These patterns are, as you wrote, related to large areas of color!

However, it is actually the color black (0,0,0) that creates this particular pattern:

F75EEBDEFDD7BAF7BF75EEBDEFDD7BAF7B

..or, when split in 2 x 68-bit parts;

F75EEBDEFDD7BAF7B
F75EEBDEFDD7BAF7B

To see it in action:

1) Create a 32 pixel x 32 pixel image filled with pure black (0,0,0) in Photoshop.

2) Choose File -> Save for Web & Devices

3) Select JPEG, with Maximum (Quality = 100), Blur = 0, and with all the Progressive / Optimized / Embed Color Profile / Convert to sRGB options = OFF, Metadata = None.

Now when you look at the image in a hex editor, it will show this Huffman Coding bitstream:

FFDA
000C03010002110311003F00F9FF00FB
F75EEBDEFDD7BAF7B
F75EEBDEFDD7BAF7B
F75EEBDEFDD7BAF7B
F75EEBDEFDD7BAF7B
F75EEBDEFDD7BAF7B
F75EEBDEFDD7BAF7B
F75EEBDEFDD7BAF7B
F75EEBDEFDD7BAF
FFD9

As you can see, it contains nearly 8 instances of the 68-bit pattern.

Similarly, if you instead create a 32 pixels x 32 pixels image filled with pure white (255,255,255) (and save it as a JPEG in the same way as above), you get this Huffman Coding bitstream:

FFDA
000C03010002110311003F00DFE3DFBA
F75EF7EEBDD7BDFBA
F75EF7EEBDD7BDFBA
F75EF7EEBDD7BDFBA
F75EF7EEBDD7BDFBA
F75EF7EEBDD7BDFBA
F75EF7EEBDD7BDFBA
F75EF7EEBDD7BDFBA
F75EF7EEBDD7  F
FFD9

I also tried to create a 64 pixels x 64 pixels image, divided in the middle, with the left 32 pixels x 64 pixels pure black (0,0,0), and the right 32 pixels x 64 pixels pure white (255,255,255). Then saved as JPEG with Quality = 100 etc. etc. I then got this Huffman Coding bitstream:

FFDA
000C03010002110311003F00F9FF00FB
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAFBFC7B
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAF803FB
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAFBFC7B
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAF803FB
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAFBFC7B
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAF803FB
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAFBFC7B
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAF803FB
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAFBFC7B
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAF803FB
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAFBFC7B
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAF803FB
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAFBFC7B
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAF803FB
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAFBFC7B
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BA
FFD9

When I found out, I first thought: "But isn't that Huffman Coding supposed to be more efficient than.. this..!? 8 identical patterns in the 32 pixels x 32 pixels pure colored ones, and 16 + 8 + 8 identical ones in the 64 pixels x 64 pixels half black / half white one..? Why not just use one, and then use pointers, like, use this particular pattern here, here, there and ..there."

Then, I remembered the fact that these JPEG's are actually pretty unusual in that they are all made with Quality = 100.

So that Quality = 100 seems to be the other factor which is needed for seeing these F75E.. patterns.

To verify this, I then again made a 32 pixels x 32 pixels pure black (0,0,0), but now I saved instead with Quality = 0. Now this image got a much shorter Huffman Coding bitstream, which indeed also showed a certain kind of pattern, but very different one:

FFDA
000C03010002110311003F00F99
55540555501
55540555503F
FFD9
  • If an MCU consists of one color, the DCT yields all zero values for the AC coefficients. The 63 all zero coefficients can be encoded in the exactly the same way. – user3344003 Nov 03 '14 at 00:16