0

Its possible that if I access memory map of a file, via pointer of a structure type which has hole, it may not map the structure elements to correct data. For eg.

#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>

typedef union{
    int a;
    char c[4];
}INT;

typedef struct{
    char type;
    INT data;
}RECORD;

int main(){
    int fd;
    RECORD *recPtr;
    fd = open("./f1", O_RDWR);
    if (fd == -1){
            printf("Open Failed!\n");
    }
    printf("Size of RECORD: %d\n", sizeof(RECORD));
    recPtr = (RECORD *)mmap(0, 2*sizeof(RECORD), PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    if (recPtr == MAP_FAILED){
            printf("Map Filaed!\n");
    }
    printf("type: %c, data: %c%c%c%c\n", recPtr->type, recPtr->data.c[0], recPtr->data.c[1], recPtr->data.c[2], recPtr->data.c[3]);
}

If the file "f1" contains the following data:

012345678

The above programs gives the output as

Size of RECORD: 8
type: 0, data: 4567

since the characters 123 are eaten up by the structure holes.

Is there a way to avoid this without using pragma pack directive and without changing the ordering of elements in the structure.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
Karun
  • 447
  • 5
  • 17

2 Answers2

2

You basically have the following options:

  1. Accept the padding. This is fine (and the fastest option) as long as your data does not need to be portable across architectures.
  2. Use __attribute__((packed)) or similar to control padding inserted by the compiler (recommended, but requires that you use compiler extensions)
  3. Manually access at the byte level, without using structs. Eg:

    char type;
    int data;
    
    memcpy(&type, ((char *)recPtr), 1);
    memcpy(&data, ((char *)recPtr) + 1, sizeof(data));
    
bdonlan
  • 224,562
  • 31
  • 268
  • 324
  • 1
    If one wants to write robust software always use method 3. The first two methods are no reliable. Padding may not just change with architecture but also with compiler version. Also some architectures, like SPARC, enforce strict padding rules so some compiler extension alignment packing attribute may, or rather must be ignored on such architectures. – datenwolf Jun 19 '11 at 20:55
  • Thanks, 3 option is helpful till the read is concerned, however still looking for some good way so that I can change the mapped memory and sync it back. Without using any kind of packing. – Karun Jun 19 '11 at 21:09
  • You can run those memcpys in the other direction to write back – bdonlan Jun 19 '11 at 23:42
1

Reading binary data directly into structures is a recipe for disaster. It means you're making assumptions about the structure of some input without verification; of course you could check the structure for integrity afterwards. But more often than not you'll have to do architecture dependent adjustments to the input data. Think low endian vs. big endian. Different word lengths, packing rules, etc.

To make a long story short: Don't fall for the dark side and it's seducing promise of quick hacks.

The only proper way to read a file is reading it octet by octet; you can read larger chunks in a buffer of course, but you should then process them by looking at each single bit. If you worry about performance you should read Volume 1 and what's been released to far of Volume 4 of "The Art of Computer Programming" which in depth explains how to process data streams efficiently without neglecting any data.

Or use Google's Protocol Buffers.

datenwolf
  • 159,371
  • 13
  • 185
  • 298
  • I'm glad my .ELF files are not read octet by octet. It's loaded using `mmap()` and the OS does a `jump` to the first instructions... If the files are not to be shared between architecture, you probably don't have to worry too much about that part. If the compiler changes the padding, though, that's more problematic. – Alexis Wilke Oct 09 '19 at 10:05