2

I'm trying to parse a JPEG file. This page says that the format is the following :

0xFF+Marker Number(1 byte)+Data size(2 bytes)+Data(n bytes)

So, when I encounter a 0xFF, I read the data like this (s is the JPEG file stream) :

int marker, size;
byte[] data;
//marker number (1 byte)
marker = s.ReadByte();
//size (2 bytes)
byte[] b = new byte[2];
s.Read(b, 0, 2);
size = BitConverter.ToInt16(b, 0);

Problem is, size's value after that is -7937 (which causes the next lines to raise an exception because I try to allow a -7937-long byte[]). b[0] == 255 and b[1] == 224.

I suspect I don't use BitConverter.ToInt16 properly, but I can't find what I did wrong.

The BitConverter doc page says that "The order of bytes in the array must reflect the endianness of the computer system's architecture", but when I do this :

byte a = b[0]; b[0] = b[1]; b[1] = a;
size = BitConverter.ToInt16(b, 0);

...I get size == -32 which is not really better.

What's the problem ?

Hey
  • 1,701
  • 5
  • 23
  • 43
  • 7
    Perhaps it's an unsigned short, so you should use `BitConverter.ToUInt16()` – Matthew Watson Aug 09 '16 at 11:09
  • 1
    The bits are obviously `0xFF70` or `0x70FF`, both have the most significant bit set, so Matthew seems right, it should be an unsigned word. Use `uint size = BitConverter.ToUInt16()`. – René Vogt Aug 09 '16 at 11:14
  • 1
    have a look at http://stackoverflow.com/a/8227753/6007877 I think you have forgotten about the 0xE1 – Git Aug 09 '16 at 11:20
  • Don't use `BitConverter` when parsing protocols, because you want to keep the endianness work the same regardless of the architecture. [This answer](http://stackoverflow.com/a/7190266/69809) says that, while Jpeg data is **big endian** (meaning `BitConverter` on x86 *won't* work), the header can be encoded in both ways. [This page](http://www.fileformat.info/format/jpeg/corion.htm) also suggests some hints on how to detect header endianness. – vgru Aug 09 '16 at 11:59
  • Thanks all, Matthew was right : using uint, I get a least some UTF-8-encoded fields. Why post a comment and not an answer ? Also, gismo's link is much clearer than the page I linked in the question. – Hey Aug 09 '16 at 12:20
  • @Groo what do you suggest ? This is the first time I try to parse a binary format, and I don't really know how to do it properly. Is there another built-in function that takes the endianness as an argument ? – Hey Aug 09 '16 at 12:26
  • 1
    You can check out Jon Skeet's [miscutil](http://www.yoda.arachsys.com/csharp/miscutil/) (a bit dated, last version seems to be from 2009), which contains both big and little endian converters. Or, you can write them yourself (it's basically `(a[i] << 8) | a[i + 1]` vs `a[i] | (a[i + 1] << 8)`, but you will need to implement it for 16, 32 and 64 bits). And of course keep the sign in mind, `ToInt16` is not the same as `ToUInt16`. – vgru Aug 09 '16 at 12:35

2 Answers2

1

Integers are stored in Big Endian order in JPEG. If you are on a little endian system (e.g. Intel) you need to reverse the order of the bytes in the length field. Length fields are unsigned.

user3344003
  • 20,574
  • 3
  • 26
  • 62
  • Thank you, but I already tried to reverse the two bytes (see the end of my question). The unsigned int was the solution. – Hey Aug 10 '16 at 08:29
1

The data in question was an unsigned int. Using the uint type and BitConverter.ToUInt16 fixed it.

Hey
  • 1,701
  • 5
  • 23
  • 43