0

I am writing a 'clean-room' program that requires parsing/unparsing of jpegs. I have found all the information I need to parse/unparse baseline jpegs, but I cannot find the information that I need to parse/unparse progressive jpegs.

I need to be able to convert the compressed data to macroblocks and back, so most available frameworks are too high level. I also want to understand what is going on, hence the 'clean room' approach.

Can anybody help me please? A specification of the SOF1 header would be useful, as would be the layout of the compressed data in the scan segment.

Thanks in advance.

DrPhill
  • 614
  • 5
  • 16
  • Oooops, I seem to have mispoken. I need to understand an SOF2 header, not an SOF1 header. Sorry. – DrPhill Feb 16 '17 at 18:22
  • I have and example of an SOF2 header - it has 17 bytes of data which are: [8, 3, -128, 6, 64, 3, 1, 34, 0, 2, 17, 1, 3, 17, 1]. – DrPhill Feb 16 '17 at 18:36
  • type: 194 data size: 17 data: [8, 3, -128, 6, 64, 3, 1, 34, 0, 2, 17, 1, 3, 17, 1] This gives precision: 8 height: = 896 width: = 1600 3 components 1, 34, 0 id1, hSample 2 vsample 2 2, 17, 1 id2, hSample 1 vsample 1 3, 17, 1 id3, hSample 1 vsample 1 – DrPhill Feb 16 '17 at 18:36
  • The data above gives precision: 8 height: = 896 width: = 1600 3 components 1, 34, 0 id1, hSample 2 vsample 2 2, 17, 1 id2, hSample 1 vsample 1 3, 17, 1 id3, hSample 1 vsample 1 – DrPhill Feb 16 '17 at 18:42
  • The header structures are the same for all frame types. – user3344003 Feb 16 '17 at 19:58

1 Answers1

-1

If you want to figure this out, I'd get this book:

https://www.amazon.com/Compressed-Image-File-Formats-JPEG/dp/0201604434/ref=sr_1_5?ie=UTF8&qid=1486949641&sr=8-5&keywords=jpeg

It explains it all in easy-to-understand terms. The author has source code at

http://colosseumbuilders.com/sourcecode/imagelib403.zip

that is designed to be easy to understand.

The SOF1 header is the same as all other SOF headers. You need to have a copy of the JPEG standard (as obtuse as it is). The other sources above will help you get through it.

user3344003
  • 20,574
  • 3
  • 26
  • 62
  • This would be an acceptable comment to the original question. It is not, however, an answer. – Ken White Feb 13 '17 at 01:45
  • It did prod me to investigate with a byte-inspector and find out that the assertion was true and that I could actually continue in partial ignorance. So thanks for the nudge – DrPhill Feb 16 '17 at 18:44
  • The problem you have is that in order to parse a JPEG stream, you largely need to decode it. You can examine the blocks. A progressive scan has the format and restart markers as a sequential scan but getting to the actual data is tough. – user3344003 Feb 16 '17 at 19:18
  • I thank you for confirming that - I had begun to suspect that it was the case. I have analysed a SOF2 header and it *does* match the layout of the SOF0 header. I now have a problem in that I can only find two huffman tables when I would ecpect four (AC/DC for luminance/chromiance. As for the order of the blocks - oddly enough it matters not for my program. If the only difference is that they appear in a convoluted order then I am good-to-go as soon as I find my missing Huffman Tables. – DrPhill Feb 17 '17 at 22:19
  • The number of huffman tables is specified in the other blocks. There does have to be 4 tables. THere are usually at least 2. – user3344003 Feb 17 '17 at 23:08