Accessing specific binary information based on binary format documentation

Question

I have a binary file and documentation of the format the information is stored in. I'm trying to write a simple program using c++ that pulls a specific piece of information from the file but I'm missing something since the output isn't what I expect.

The documentation is as follows:

Half-word   Field Name          Type    Units   Range       Precision
10          Block Divider       INT*2   N/A     -1          N/A
11-12       Latitude            INT*4   Degrees -90 to +90  0.001

There are other items in the file obviously but for this case I'm just trying to get the Latitude value.

My code is:

#include <cstdlib>
#include <iostream>
#include <fstream>

using namespace std;

int main(int argc, char* argv[])
{
  char* dataFileLocation = "testfile.bin";

  ifstream dataFile(dataFileLocation, ios::in | ios::binary);

  if(dataFile.is_open())
  {
    char* buffer = new char[32768];
    dataFile.seekg(10, ios::beg);
    dataFile.read(buffer, 4);
    dataFile.close();

    cout << "value is << (int)(buffer[0] & 255);
  }
}

The result of which is "value is 226" which is not in the allowed range.

I'm quite new to this and here's what my intentions where when writing the above code:

Open file in binary mode
Seek to the 11th byte from the start of the file
Read in 4 bytes from that point
Close the file
Output those 4 bytes as an integer.

If someone could point out where I'm going wrong I'd sure appreciate it. I don't really understand the (buffer[0] & 255) part (took that from some example code) so layman's terms for that would be greatly appreciated.

Hex Dump of the first 100 bytes:

testfile.bin  98,402 bytes   11/16/2011   9:01:52
          -0 -1 -2 -3  -4 -5 -6 -7  -8 -9 -A -B  -C -D -E -F

00000000- 00 5F 3B BF  00 00 C4 17  00 00 00 E2  2E E0 00 00 [._;.............]
00000001- 00 03 FF FF  00 00 94 70  FF FE 81 30  00 00 00 5F [.......p...0..._]
00000002- 00 02 00 00  00 00 00 00  3B BF 00 00  C4 17 3B BF [........;.....;.]
00000003- 00 00 C4 17  00 00 00 00  00 00 00 00  80 02 00 00 [................]
00000004- 00 05 00 0A  00 0F 00 14  00 19 00 1E  00 23 00 28 [.............#.(]
00000005- 00 2D 00 32  00 37 00 3C  00 41 00 46  00 00 00 00 [.-.2.7.<.A.F....]
00000006- 00 00 00 00                                        [....            ]

What is a half-word? what is int*2, what is int*4? Are the offsets zero-based or one-based. Also: please add (part of) a hexdump to the OQ. — wildplasser, Nov 16 '11 at 20:42
@wildplasser: My wild guess is that we are counting units of 2-bytes, and `int*k` means "k-byte integer"... — Kerrek SB, Nov 16 '11 at 20:46
That leaves the zero-based indexing (in 16 bit units?). BTW: IIRC car engine controls work in 16bit quantities. But on second thought: I don't think they store Lattitude ;-) — wildplasser, Nov 16 '11 at 20:50
Also is the integers store in big or little endian format? Assuming native endianess, after seeking `11*2` bytes from the beginning you'll probably want to read an int32_t that presumably has the range -90000..90000 — user786653, Nov 16 '11 at 20:59
Unfortunately that's all the info I have but I assumed INT*2 and INT*4 referred to a short int and an int respectively. Offsets are 1-based. — TheOx, Nov 16 '11 at 21:04
That should have been skip `10*2` bytes i now see from the hex dump. Also the file is in big endian format by the looks of it. — user786653, Nov 16 '11 at 21:16

Mark Ransom · Accepted Answer · 2011-11-16T21:25:19.403

Since the documentation lists the field as an integer but shows the precision to be 0.001, I would assume that the actual value is the stored value multiplied by 0.001. The integer range would be -90000 to 90000.

The 4 bytes must be combined into a single integer. There are two ways to do this, big endian and little endian, and which you need depends on the machine that wrote the file. x86 PCs for example are little endian.

int little_endian = buffer[0] | buffer[1]<<8 | buffer[2]<<16 | buffer[3]<<24;
int big_endian    = buffer[0]<<24 | buffer[1]<<16 | buffer[2]<<8 | buffer[3];

The &255 is used to remove the sign extension that occurs when you convert a signed char to a signed integer. Use unsigned char instead and you probably won't need it.

Edit: I think "half-word" refers to 2 bytes, so you'll need to skip 20 bytes instead of 10.

you were right about the "half-word" referring to 2 bytes - thank you so much for that, it was the key to my problem! — TheOx, Nov 17 '11 at 17:20

Accessing specific binary information based on binary format documentation

1 Answers1