1

I am trying to write a program that reads in binary files using C++. I am running into some unusual output that I was hoping for some help with.

I have a binary file that starts with these 4 bytes:

A1 B2 C3 D4 (verified using hexdump -C)

And here is the code I am using to read these 4 bytes in:

#include <iostream> // for reading and writing files
#include <fstream>

using namespace std;

char * buffer;
unsigned int LENGTH = 4;

int main(int argc, char ** argv)
{
    // open the binary file
    ifstream infile ("dump", ios::binary | ios::in);

    // create a buffer
    buffer = new char[LENGTH];

    // read LENGTH bytes from the file
    infile.read(buffer, LENGTH);

    // append the null byte to terminate the string
    buffer[LENGTH] = '\0';

    // loop over the 4 bytes read, and print
    for(int i = 0; i < LENGTH; i++)
    {
       printf("Buffer[%d] is %X\n", i, buffer[i]);
    }

    delete[] buffer;
    infile.close();


    return 0;
}

This program gives me these actual results:

Buffer[0] is FFFFFFA1
Buffer[1] is FFFFFFB2
Buffer[2] is FFFFFFC3
Buffer[3] is FFFFFFD4

BUT, I would expect these results:

Buffer[0] is A1
Buffer[1] is B2
Buffer[2] is C3
Buffer[3] is D4

Can anyone explain to me where the 3 0xFF bytes are coming from ? It only seems to be affecting the first 4 bytes of the file, the next 4 bytes print out as expected without any 0xFF bytes prepended.

Hunter McMillen
  • 59,865
  • 24
  • 119
  • 170
  • 1
    You are reading bytes and printing them as integers. If the bytes are larger than 127 per value, the integers expanded will be negative then (the sign is the first bit of the byte, which is set in all values you read). The conversion is done in the call to the printf function (default conversion char to int). The off-by-one error doesn't matter here, but should be of course corrected. – rubber boots Sep 03 '12 at 14:45

2 Answers2

8

The char buffer[i] is default promoted when passed as a variable argument. To get the right value, say static_cast<unsigned char>(buffer[i]).

Moreover, buffer[LENGTH] is out of bounds and thus undefined behaviour.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • Thanks, that was exactly the problem. Just out of curiosity why is a `static_cast` here better than a regular cast. `(unsigned char)`? – Hunter McMillen Sep 03 '12 at 14:48
  • @HunterMcMillen: You're confused: `static_cast` **is** the regular cast. What you said is barely C++, and mostly evil. :-) – Kerrek SB Sep 03 '12 at 14:52
  • I haven't really had much experience with C++. I have worked with C a little bit and am just used to seeing those C style casts. So static_cast is the recommended way of casting in C++? – Hunter McMillen Sep 03 '12 at 14:54
  • @HunterMcMillen: Yes. And *especially* if you come from C, it's worthwhile to always step back, have a coffee and say to yourself three times that "C++ is not C". – Kerrek SB Sep 03 '12 at 14:55
  • @HunterMcMillen - `static_cast` is the recommended way of doing the particular casts that it supports. That's the benefit of the four built-in C++ casts: each does a particular set of things, unlike a C-style cast, which does pretty much everything and sometimes overreaches. – Pete Becker Sep 03 '12 at 15:04
5

You can't read LENGTH bytes of content, and terminate at offset LENGTH in a buffer of size LENGTH. This is an off-by-one error.

buffer = new char[LENGTH];

This gives you space for LENGTH characters, with indices 0 through LENGTH - 1. So this:

buffer[LENGTH] = '\0';

writes outside the allocated memory, invoking undefined behavior.

unwind
  • 391,730
  • 64
  • 469
  • 606
  • Thank you for pointing this out, I was more concerned about with the numbers were wrapping around to negative. But this is an important piece that I missed also. – Hunter McMillen Sep 03 '12 at 14:48