2

I'm coding a network layer protocol and it is required to find a size of packed a structure defined in C. Since compilers may add extra padding bytes which makes sizeof function useless in my case. I looked up Google and find that we could use ___attribute(packed)___ something like this to prevent compiler from adding extra padding bytes. But I believe this is not portable approach, my code needs to support both windows and linux environment.

Currently, I've defined a macro to map packed sizes of every structure defined in my code. Consider code below:

typedef struct {
...
} a_t;

typedef struct {
...
} b_t;

#define SIZE_a_t 8;
#define SIZE_b_t 10;

#define SIZEOF(XX) SIZE_##XX;

and then in main function, I can use above macro definition as below:-

int size = SIZEOF(a_t);

This approach does work, but I believe it may not be best approach. Any suggestions or ideas on how to efficiently solve this problem in C?

Example

Consider the C structure below:-

typedef struct {
   uint8_t  a;
   uint16_t b;
} e_t;

Under Linux, sizeof function return 4 bytes instead of 3 bytes. To prevent this I'm currently doing this:-

typedef struct {
   uint8_t  a;
   uint16_t b;
} e_t;

#define SIZE_e_t 3
#define SIZEOF(XX) SIZE_##e_t

Now, when I call SIZEOF(e_t) in my functin, it should return 3 not 4.

Shivam
  • 2,134
  • 1
  • 18
  • 28
  • 3
    You need to handle serialization/deserialization explicitly anyway to enforce a particular byte ordering, and those routines already would have to know exactly how many bytes to read/write. – jamesdlin May 26 '12 at 23:23
  • Agreed, so I should define size macro for every structure used in the protocol. There are not many structures about 20 or 30. I was just concerned if this is a novel approach? – Shivam May 26 '12 at 23:25
  • In your example, the padding in the structure is *not* at the end, it's in between the two fields. So it's no use writing your code to use the size `3` instead of the size `sizeof(e_t)`, no matter what macro you write to represent the number `3`. Your struct will still fail to represent whatever message it's supposed to send/receive over the network. – Steve Jessop May 26 '12 at 23:28
  • What's the reason why you don't like the `SIZE_##XX` approach? Is it because it's tedious/hard to scale to a large number of structs, or something else? – user541686 May 26 '12 at 23:37
  • I'm currently using this approach. I'm still relatively new to C comparing to the people out here. So I was just looking for some suggestion from the community :). – Shivam May 26 '12 at 23:39

4 Answers4

4

sizeof is the portable way to find the size of a struct, or of any other C data type.

The problem you're facing is how to ensure that your struct has the size and layout that you need.

#pragma pack or __attribute__((packed)) may well do the job for you. It's not 100% portable (there's no mention of packing in the C standard), but it may be portable enough for your current purposes, but consider whether your code might need to be ported to some other platform in the future. It's also potentially unsafe; see this question and this answer.

The only 100% portable approach is to use arrays of unsigned char and keep track of which fields occupy which ranges of bytes. This is a lot more cumbersome, of course.

Community
  • 1
  • 1
Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • Thanks, yes layout is the problem that I'm facing. For now, I'll look into `#pragma pack` or `__attribute__((packed))` – Shivam May 27 '12 at 00:02
2

Your macro tells you the size that you think the struct should have, if it has been laid out as you intend.

If that's not equal to sizeof(a_t), then whatever code you write that thinks it is packed isn't going to work anyway. Assuming they're equal, you might as well just use sizeof(a_t) for all purposes. If they're not equal then you should be using it only for some kind of check that SIZEOF(a_t) == sizeof(a_t), which will fail and prevent your non-working code from compiling.

So it follows that you might as well just put the check in the header file that sizeof(a_t) == 8, and not bother defining SIZEOF.

That's all aside from the fact that SIZEOF doesn't really behave like sizeof. For example consider typedef a_t foo; sizeof(foo);, which obviously won't work with SIZEOF.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • I know, I can always use sizeof function but problem is that compiler may add extra bytes in my structure. For transferring the structure over network, it required for me know exact size of structure not the one that sizeof provides. – Shivam May 26 '12 at 23:23
  • 1
    @Shivam: obviously defining a macro `SIZEOF` doesn't prevent the compiler from adding extra bytes to your structure. So I really don't see what you hope to achieve by it. I'm pretty sure that you should be asking how to pack a structure in each of the compilers you intend to use (and you must list them). There's no point asking about this `SIZEOF` macro, since it doesn't solve your problem. There's also no truly portable way to instruct a C compiler to pack a structure. – Steve Jessop May 26 '12 at 23:26
  • Yes, you are right, by defining the `SIZEOF` macro does not prevent the compiler for adding extra bytes. But I could manually define packed sizes of every structure that I may be using in my protocol. If you check my code above, I've defined `SIZE_a_t` which is actual size of `a_t` without compiler paddings. So I do achieve my objective but my question is: is it the only way to achieve it? – Shivam May 26 '12 at 23:28
  • 1
    @Shivam: No, it's *not* the actual size of `a_t`. The actual size of `a_t` is `sizeof(a_t)`. `SIZE_a_t` is the size you'd like `a_t` to be, but wishing doesn't make it so. `__attribute__((packed))` does make it so :-) – Steve Jessop May 26 '12 at 23:31
  • I know this is not an actual size of the structure, but I'm not concerned about that. I need to find packed size of structure in the portable way. – Shivam May 26 '12 at 23:35
  • There's nothing useful you can do with the "packed size" of the structure. Padding is not at the end. It's between elements that are not naturally aligned. Reading/writing the first N bytes, where N is the "packed size" of the structure, will NOT work. – R.. GitHub STOP HELPING ICE May 26 '12 at 23:44
  • Well, I need it for the protocol purpose. This how message should be sent across the network.When passing any structure I need to serialize to char pointers and should know the exact packed size. Since padding it different of different machine, I can't send the padded structure across network. – Shivam May 26 '12 at 23:50
1

I don't think, that specifying size manually is more portable, than using sizeof.

If size is changed your const-specified size will be wrong.

Attribute packed is portable. In Visual Studio it is #pragma pack.

Ruben
  • 2,488
  • 1
  • 18
  • 22
  • 2
    `packed` is not entirely portable. There's no mention of it in the language standard. – Keith Thompson May 26 '12 at 23:18
  • Perfect thank. I would look into #pragma pack. Also is there any performance issue or any other issue to forcefully pack a structure? I believe protocol structure would not be changing any time soon. – Shivam May 26 '12 at 23:19
1

I would recommend against trying to read/write data by overlaying it on a struct. I would suggest instead writing a family of routines which are conceptually like printf/scanf, but which use format specifiers that specify binary data formats. Rather than using percent-sign-based tags, I would suggest simply using a binary encoding of the data format.

There are a few approaches one could take, involving trade-off between the size of the serialization/deserialization routines themselves, the size of the code necessary to use them, and the ability to handle a variety of deserialization formats. The simplest (and most easily portable) approach would be to have routines which, instead of using a format string, process items individually by taking a double-indirect pointer, read some data type from it, and increment it suitably. Thus:

uint32_t read_uint32_bigendian(uint8_t const ** src)
{
  uint8_t *p;
  uint32_t tmp;

  p = *src;
  tmp = (*p++) << 24;
  tmp |= (*p++) << 16;
  tmp |= (*p++) << 8;
  tmp |= (*p++);
  *src = p;
}

...
  char buff[256];
...
  uint8_t *buffptr = buff;
  first_word = read_uint32_bigendian(&buffptr);
  next_word = read_uint32_bigendian(&buffptr);

This approach is simple, but has the disadvantage of having lots of redundancy in the packing and unpacking code. Adding a format string could simplify it:

#define BIGEND_INT32 "\x43"  // Or whatever the appropriate token would be
  uint8_t *buffptr = buff;
  read_data(&buffptr, BIGEND_INT32 BIGEND_INT32, &first_word, &second_word);

This approach could read any number of data items with a single function call, passing buffptr only once, rather than once per data item. On some systems, it might still be a bit slow. An alternative approach would be to pass in a string indicating what sort of data should be received from the source, and then also pass in a string or structure indicating where the data should go. This could allow any amount of data to be parsed by a single call giving a double-indirect pointer for the source, a string pointer indicating the format of data at the source, a pointer to a struct indicating how the data should be unpacked, and a a pointer to a struct to hold the target data.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • Thank for the reply. Yes, currently this is my approach. I've written a routine for serializing and deserializing of every struct in my program. – Shivam May 27 '12 at 21:01