1

I am trying to read a binary file into memory, and then use it like so:

struct myStruct {
    std::string mystring; // is 40 bytes long
    uint myint1; // is 4 bytes long
};

typedef unsigned char byte;

byte *filedata = ReadFile(filename); // reads file into memory, closes the file
myStruct aStruct;
aStruct.mystring = filedata.????

I need a way of accessing the binary file with an offset, and getting a certain length at that offset. This is easy if I store the binary file data in a std::string, but i figured that using that to store binary data is not as good way of doing things. (filedata.substr(offset, len))

Reasonably extensive (IMO) searching hasn't turned anything relevant up, any ideas? I am willing to change storage type (e.g. to std::vector) if you think it is necessary.

LordAro
  • 1,269
  • 3
  • 18
  • 35
  • If you have a byte* to the head of the data in memory, why don't you just walk down the length, copying the data as you go? As long as you increment your pointer and know how far to go, it is all good and easy. – StarPilot Jan 29 '13 at 20:10
  • but how can i get a specific length from that, and not just the byte at the current position of the pointer? – LordAro Jan 29 '13 at 20:11
  • @LordAro it sounded like you knew the length of the string is 40 bytes long, followed by a 4 byte integer. Also beware of endianness doing it this way. – lcs Jan 29 '13 at 20:13
  • the actual code is: string[40], string[4], string[12], uint, uint, but that doesn't really matter. I am also aware of the scary endianness. However, that still does not resolve the problem, which i guess is this: How can i access more than 1 byte when accessing like this: filedata[pointer] ?? – LordAro Jan 29 '13 at 20:16
  • Basics of pointer magic. If your data field is fixed lengths, then it is simple c coding. set a currentpointer equal to your starting point. Copy it for 40 characters (your mystring set length). Advance your currentpointer by 40. set myint1 = (int)*currentcounter. Increment currentpointer by sizeof(int). if you have reached the end of your data, stop. if not, repeat the string copy and int setting until you do. Note: This only works with fixed length strings. If you have variable length, then you should serialize into memory the same way the data was serialized to disk/storage. – StarPilot Jan 29 '13 at 20:18
  • Dereferencing and Casting. That's the secret of computers. It is all bits and bytes. We use casting to magically change bytes into UINTs or DATEs or whatever else is needed. – StarPilot Jan 29 '13 at 20:21
  • Just out of curiousity, why aren't you serializing it directly into your in-memory data structures? Why bother loading it into memory first and then serializing that into usable data? – StarPilot Jan 29 '13 at 20:22
  • 1
    You need to deference the byte pointer and store in whatever data you're using. `struct1.string = *(bytePtr + sizeof(char)*40); struct1.int1 = *(bytePtr + (sizeof(char)*40 + sizeof(int));`. Again, beware of endianness, you're much better off serializing your data in. – lcs Jan 29 '13 at 20:25
  • And a little gotcha on casting your `currentpointer` to a string and then doing a string copy function to make a "proper" string for your string data field--- use a fixed length string copy function so it won't go on forever looking for a null terminator. – StarPilot Jan 29 '13 at 20:27
  • 1
    Take a look at `boost::serialize` and also search the web for "c++ serialize". – Thomas Matthews Jan 29 '13 at 20:29

2 Answers2

3

If you're not going to use a serialization library, then I suggesting adding serialization support to each class:

struct My_Struct
{
    std::string my_string;
    unsigned int my_int;
    void Load_From_Buffer(unsigned char const *& p_buffer)
    {
        my_string = std::string(p_buffer);
        p_buffer += my_string.length() + 1; // +1 to account for the terminating nul character.
        my_int = *((unsigned int *) p_buffer);
        p_buffer += sizeof(my_int);
    }
};

unsigned char * const buffer = ReadFile(filename);
unsigned char * p_buffer = buffer;
My_Struct my_variable;
my_variable.Load_From_Buffer(p_buffer);

Some other useful interface methods:

unsigned int Size_On_Stream(void) const; // Returns the size the object would occupy in the stream.
void Store_To_Buffer(unsigned char *& p_buffer); // Stores object to buffer, increments pointer.

With templates you can extend the serialization functionality:

void Load_From_Buffer(std::string& s, unsigned char *& p_buffer)
{
    s = std::string((char *)p_buffer);
    p_buffer += s.length() + 1;
}

void template<classtype T> Load_From_Buffer(T& object, unsigned char *& p_buffer)
{
  object.Load_From_Buffer(p_buffer);
}

Edit 1: Reason not to write structure directly

In C and C++, the size of a structure may not be equal to the sum of the size of its members.
Compilers are allowed to insert padding, or unused space, between members so that the members are aligned on an address.

For example, a 32-bit processor likes to fetch things on 4 byte boundaries. Having one char in a structure followed by an int would make the int on relative address 1, which is not a multiple of 4. The compiler would pad the structure so that the int lines up on relative address 4.

Structures may contain pointers or items that contain pointers.
For example, the std::string type may have a size of 40, although the string may contain 3 characters or 300. It has a pointer to the actual data.

Endianess.
With multibyte integers some processors like the Most Significant Byte (MSB), a.k.a. Big Endian, first (the way humans read numbers) or the Least Significant Byte first, a.k.a. Little Endian. The Little Endian format takes less circuitry to read than the Big Endian.

Edit 2: Variant records

When outputting things like arrays and containers, you must decide whether you want to output the full container (include unused slots) or output only the items in the container. Outputting only the items in the container would use a variant record technique.

Two techniques for outputting variant records: quantity followed by items or items followed by a sentinel. The latter is how C-style strings are written, with the sentinel being a nul character.

The other technique is to output the quantity of items, followed by the items. So if I had 6 numbers, 0, 1, 2, 3, 4, 5, the output would be:
6 // The number of items
0
1
2
3
4
5

In the above Load_From_Buffer method, I would create a temporary to hold the quantity, write that out, then follow with each item from the container.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
  • this is looking good, 1 question: Why the need for buffer _and_ p_buffer? (I'm not very good at C++ :L ) EDIT: Ignore this, it's the pointer to the array – LordAro Jan 29 '13 at 20:52
  • 1
    If you pass `buffer` to the methods, the methods will increment it and you will lose the start of the original buffer. Always best to play with an additional pointer into a buffer. – Thomas Matthews Jan 29 '13 at 20:55
  • @LordAro: Reminder: if you like the answer, click on the check mark. – Thomas Matthews Jan 29 '13 at 21:07
  • Done, thanks :) Oh: "In C and C++, the size of a structure may not be equal to the sum of the size of its members." <-- Indeed, i have already come across this 'issue' :) – LordAro Jan 29 '13 at 21:34
  • Out of interest (if you're still there), could this method also be applied to vectors? I'm having trouble finding the size of the array... – LordAro Jan 29 '13 at 21:52
  • I really get puzzled when people can't find the size of an array they created, after all, you need to specify the number of elements before it can be created. In order to output an array, you would need to write the number of elements as one item, followed by all the elements. – Thomas Matthews Jan 30 '13 at 01:28
  • Indeed i can, but it involves in/out parameters – LordAro Jan 30 '13 at 20:44
0

You could overload the std::ostream output operator and std::istream input operator for your structure, something like this:

struct Record {
    std::string name;
    int value;
};

std::istream& operator>>(std::istream& in, Record& record) {
    char name[40] = { 0 };
    int32_t value(0);
    in.read(name, 40);
    in.read(reinterpret_cast<char*>(&value), 4);
    record.name.assign(name, 40);
    record.value = value;
    return in;
}

std::ostream& operator<<(std::ostream& out, const Record& record) {
    std::string name(record.name);
    name.resize(40, '\0');
    out.write(name.c_str(), 40);
    out.write(reinterpret_cast<const char*>(&record.value), 4);
    return out;
}

int main(int argc, char **argv) {
    const char* filename("records");
    Record r[] = {{"zero", 0 }, {"one", 1 }, {"two", 2}};
    int n(sizeof(r)/sizeof(r[0]));

    std::ofstream out(filename, std::ios::binary);
    for (int i = 0; i < n; ++i) {
        out << r[i];
    }
    out.close();

    std::ifstream in(filename, std::ios::binary);
    std::vector<Record> rIn;
    Record record;
    while (in >> record) {
        rIn.push_back(record);
    }
    for (std::vector<Record>::iterator i = rIn.begin(); i != rIn.end(); ++i){
        std::cout << "name: " << i->name << ", value: " << i->value
                  << std::endl;
    }
    return 0;
}