44

For example, say I have two equivalent structs a and b in different projects:

typedef struct _a
{
    int a;
    double b;
    char c;
} a;

typedef struct _b
{
    int d;
    double e;
    char f;
} b;

Assuming I haven't used any directives like #pragma pack and these structs are compiled on the same compiler for the same architecture with the same optimization settings, will they have identical padding between variables?

Govind Parmar
  • 20,656
  • 7
  • 53
  • 85
  • 24
    "on the same compiler on the same architecture" *with the same optimization settings*, then yes. :-) But relying on this seems like a code smell to me. What actual problem are you trying to solve? – Cody Gray - on strike Jun 11 '17 at 14:59
  • 11
    The problem is not the host architecture, but the target architecture. And actually the arch alone is not enough either. You need the platform. But that looks like an XY problem and your approach is doomed by design. Why do you want to rely on the same layout? If you intend to dump them to a file or transfer via some socket/etc, use proper marshalling. What makes you think the layout will not change in the future? – too honest for this site Jun 11 '17 at 15:11
  • 4
    As a sidenote: In a synchronous digital computer basically every internal change of state is deterministic. Getting non-deterministic results (e.g. true random values) is actually a major problem. You don't mean deterministic, but "guaranteed and identical layout" (packing means something else, too). – too honest for this site Jun 11 '17 at 15:33
  • 1
    @Olaf Could the mangled names of symbols in C++ anonymous namespaces be non-deterministic? – aschepler Jun 11 '17 at 17:44
  • 2
    @aschepler: Only if the compiler uses a true random function. Read my comment carefully, it is very true! And why would they need to? They just need to be unique. Pseudorandom or a hash from the input is sufficient; a simple counter could be enough (possibly with a hash-prefix, etc.). It also simplifies debugging of the compiler if they are not true random. – too honest for this site Jun 11 '17 at 17:59
  • 6
    Where does the "code smell" / "XY problem" thought come from, and how on earth does this question earn a "-1" downvote? The OP asked a clear technical question, at no point did he venture forth that he actually wants to do that or proposes this as a pattern to follow. Digging deep is a good thing. The point is not to get a free ticket for doing something like this, but to learn about all kinds of possible influences / mechanics at work which could or could not make this possible. As provided by @rici in his answer. – AnoE Jun 12 '17 at 10:28
  • 1
    @AnoE I have no idea why the question was downvoted, but the source of the "code smell"/"XY problem" thought is pretty obvious and pretty well explained by the prior comments. I'm certainly not trying to discourage Govind or anyone else from learning how things work, and no one is accusing him of asking a poor-quality question. We just want to be clear that certain things should not be assumed if you want to write quality, portable code, and this is one of them. Remember that these are *comments*, not answers, so they weren't trying to answer the question being asked. – Cody Gray - on strike Jun 12 '17 at 15:25
  • 1
    " We just want to be clear that certain things should not be assumed" but assuming that the same struct compiles to the same layout is what everyone who ever compiled separate source files into .o's to link them later does, and much more so for shared libraries. I find it a nice coincidence that the OP by chance hit exactly the one non-trivial, non-intuitive reason for the possibility that the layout could be different per the specs - by using different *names* for the attributes. Assuming a code smell / XY here is awfully close to "stupid question". I find this a very intelligent question. – AnoE Jun 12 '17 at 15:46

8 Answers8

55

The compiler is deterministic; if it weren't, separate compilation would be impossible. Two different translation units with the same struct declaration will work together; that is guaranteed by §6.2.7/1: Compatible types and composite types.

Moreover, two different compilers on the same platform should interoperate, although this is not guaranteed by the standard. (It's a quality of implementation issue.) To allow inter-operability, compiler writers agree on a platform ABI (Application Binary Interface) which will include a precise specification of how composite types are represented. In this way, it is possible for a program compiled with one compiler to use library modules compiled with a different compiler.

But you are not just interested in determinism; you also want the layout of two different types to be the same.

According to the standard, two struct types are compatible if their members (taken in order) are compatible, and if their tags and member names are the same. Since your example structs have different tags and names, they are not compatible even though their member types are, so you cannot use one where the other is required.

It may seem odd that the standard allows tags and member names to affect compatibility. The standard requires that the members of a struct be laid out in declaration order, so names cannot change the order of members within the struct. Why, then, could they affect padding? I don't know of any compiler where they do, but the standard's flexibility is based on the principle that the requirements should be the minimum necessary to guarantee correct execution. Aliasing differently tagged structs is not permitted within a translation unit, so there is no need to condone it between different translation units. And so the standard does not allow it. (It would be legitimate for an implementation to insert information about the type in a struct's padding bytes, even if it needed to deterministically add padding to provide space for such information. The only restriction is that padding cannot be placed before the first member of a struct.)

A platform ABI is likely to specify the layout of a composite type without reference to its tag or member names. On a particular platform, with a platform ABI which has such a specification and a compiler documented to conform to the platform ABI, you could get away with the aliasing, although it would not be technically correct, and obviously the preconditions make it non-portable.

rici
  • 234,347
  • 28
  • 237
  • 341
  • Also FFI would be impossible: connecting to C interfaces from non-C languages. – Kaz Jun 11 '17 at 22:44
  • @kaz: that's not required by the standard either :-). But of course it is useful. – rici Jun 11 '17 at 22:48
  • It's conceivable that compilers insert run time information in the objects which could differ in size with the type name. That would make objects with the same user data but different types have different sizes and layouts. – Peter - Reinstate Monica Jun 12 '17 at 12:42
  • @peteraschneider: yes, that is basically what I wrote in parentheses at the end of the fifth paragraph. – rici Jun 12 '17 at 13:18
16

The C standard itself says nothing about it, so in line of principle you just cannot be sure.

But: most probably your compiler adheres to some particular ABI, otherwise communicating with other libraries and with the operating system would be a nightmare. In this last case, the ABI will usually prescribe exactly how packing works.

For example:

  • on x86_64 Linux/BSD, the SystemV AMD64 ABI is the reference. Here (§3.1) for every primitive processor data type it is detailed the correspondence with the C type, its size and its alignment requirement, and it's explained how to use this data to make up the memory layout of bitfields, structs and unions; everything (besides the actual content of the padding) is specified and deterministic. The same holds for many other architectures, see these links.

  • ARM recommends its EABI for its processors, and it's generally followed by both Linux and Windows; the aggregates alignment is specified in "Procedure Call Standard for the ARM Architecture Documentation", §4.3.

  • on Windows there's no cross-vendor standard, but VC++ essentially dictates the ABI, to which virtually any compiler adhere; it can be found here for x86_64, here for ARM (but for the part of interest of this question it just refers to the ARM EABI).

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
10

Any sane compiler will produce identical memory layout for the two structs. Compilers are usually written as perfectly deterministic programs. Non-determinism would need to be added explicitly and deliberately, and I for one fail to see the benefit of doing so.

However, that does not allow you to cast a struct _a* to a struct _b* and access its data via both. Afaik, this would still be a violation of strict aliasing rules even if the memory layout is identical, as it would allow the compiler to reorder accesses via the struct _a* with accesses via the struct _b*, which would result in unpredictable, undefined behavior.

cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106
  • This is one reason why unions exist: so you can have something returning various structs, the exact one of which is determined at run time. For example, Xwindows events. – jamesqf Jun 11 '17 at 17:01
8

will they have identical padding between variables?

In practice, they mostly like to have the same memory layout.

In theory, since the standard doesn't say much on how padding should be employed on objects, you can't really assume anything on the padding between the elements.

Also, I can't see even why would you want to know/assume something about the padding between the members of a struct. simply write standard, compliant C code and you'll be fine.

David Haim
  • 25,446
  • 3
  • 44
  • 78
  • 4
    If struct layout weren't deterministic, it would be impossible to provide a compiled, binary library, with header files for development. (At least not in cases where structs are involved.) – Kaz Jun 11 '17 at 22:46
  • 1
    In many cases, @kaz, that *is* impossible. In particular, you cannot do that with two different versions of Microsoft's compiler. Other compilers may make it a design feature to allow this type of binary compatibility, but they are not required to do so by the language standard, and many do not. That has little to do with "deterministic" vs. "non-deterministic" output, though. – Cody Gray - on strike Jun 12 '17 at 10:23
  • 3
    _"I can't see even why would you want to know/assume something about the padding between the members of a struct"_ - sometimes you might wish to use a `struct` to exactly match the layout of a network packet, for example. – Alnitak Jun 12 '17 at 10:28
  • @Alnitak even if you use a socket, it doesn't change the fact that `a` is not `b`. they are different types. – David Haim Jun 12 '17 at 10:29
  • This is not an answer. "they mostly like to have the same memory layout" -- when, and when do they not? Is it really a matter of preference? – Michael Foukarakis Jun 12 '17 at 12:42
5

You cannot approach deterministically the layout of a structure or union in C language on different systems.

While many times it could seem that the layout generated by different compilers is the same, you must consider the cases a convergence dictated by practical and functional convenience of compiler design in the ambit of choice freedom left to the programmer by the standard, and thus not effective.

The C11 standard ISO/IEC 9899:2011, almost unchanged from previous standards, clearly stated in paragraph 6.7.2.1 Structure and union specifiers:

Each non-bit-field member of a structure or union object is aligned in an implementation defined manner appropriate to its type.

Even worst the case of bitfields where a large autonomy is left to the programmer:

An implementation may allocate any addressable storage unit large enough to hold a bitfield. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.

Just count how many times the terms 'implementation-defined' and 'unspecified' appear in the text.

Agreed that to check compiler version, machine and target architecture each run before to use structure or union generated on a different system is unaffordable you should have got a decent answer to your question.

Now let's say that yes, there is a way-around.

Be clear that it is not definitely the solution, but is a common approach that you can found around when data structures exchange is shared between different systems: pack structure elements on value 1 (standard char size).

The use of packing and an accurate structure definition can lead to a sufficiently reliable declaration that can be used on different systems. The packing forces the compiler to remove implementation defined alignments, reducing the eventual incompatibilities due to standard. Moreover avoiding to use bitfields you can remove residual implementation dependent inconsistencies. Last, the access efficiency, due to missing alignment can be recreated by manually adding some dummy declaration inbetween elements, crafted in such a way to force back each field on correct alignment.

As a residual case you have to consider a padding at structure end that some compilers add, but because there is no useful data associated you can ignore it (unless for dynamic space allocation, but again you can deal with it).

Frankie_C
  • 4,764
  • 1
  • 13
  • 30
4

ISO C says that two struct types in different translation units are compatible if they have the same tag and members. More precisely, here is the exact text from the C99 standard:

6.2.7 Compatible type and composite type

Two types have compatible type if their types are the same. Additional rules for determining whether two types are compatible are described in 6.7.2 for type specifiers, in 6.7.3 for type qualifiers, and in 6.7.5 for declarators. Moreover, two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements: If one is declared with a tag, the other shall be declared with the same tag. If both are complete types, then the following additional requirements apply: there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types, and such that if one member of a corresponding pair is declared with a name, the other member is declared with the same name. For two structures, corresponding members shall be declared in the same order. For two structures or unions, corresponding bit-fields shall have the same widths. For two enumerations, corresponding members shall have the same values.

It seems very strange if we interpret it from the point of view of, "what, the tag or member names could affect padding?" But basically the rules are simply as strict as they can possibly be while allowing the common case: multiple translation units sharing the exact text of a struct declaration via a header file. If programs follow looser rules, they aren't wrong; they are just not relying on requirements for behavior from the standard, but from elsewhere.

In your example, you are running afoul of the language rules, by having only structural equivalence, but not equivalent tag and member names. In practice, this is not actually enforced; struct types with different tags and member names in different translation units are de facto physically compatible anyway. All sorts of technology depends on this, such as bindings from non-C languages to C libraries.

If both your projects are in C (or C++), it would probably be worth the effort to try to put the definition into a common header.

It's also a good idea to put in some defense against versioning issues, such as a size field:

// Widely shared definition between projects affecting interop!
// Do not change any of the members.
// Add new ones only at the end!
typedef struct a
{
    size_t size; // of whole structure
    int a;
    double b;
    char c;
} a;

The idea is that whoever constructs an instance of a must initialize the size field to sizeof (a). Then when the object is passed to another software component (perhaps from the other project), it can check the size against its sizeof (a). If the size field is smaller, then it knows that the software which constructed a is using an old declaration with fewer members. Therefore, the nonexistent members must not be accessed.

Kaz
  • 55,781
  • 9
  • 100
  • 149
2

Any particular compiler ought to be deterministic, but between any two compilers, or even the same compiler with different compilation options, or even between different versions of the same compiler, all bets are off.

You're much better off if you don't depend on the details of the structure, or if you do, you should embed code to check at runtime that the structure is actually as you depend.

A good example of this is the recent change from 32 to 64 bit architectures, where even if you didn't change the size of integers used in a structure, the default packing of partial integers changed; where previously 3 32bit integers in a row would pack perfectly, now they pack into two 64 bit slots.

You can't possibly anticipate what changes may occur in the future; if you depend on details that are not guaranteed by the language, such as structure packing, you ought to verify your assumptions at runtime.

ddyer
  • 1,792
  • 19
  • 26
-1

Yes. You should always assume deterministic behaviour from your compiler.

[EDIT] From the comments below, it is obvious there are many Java programmers reading the question above. Let's be clear: C structs do not generate any name, hash, or the likes in object files, libraries, or dlls. The C function signatures do not refer to them either. Which means, the member names can be changed at whim - really! - provided the type and order of the member variables is the same. In C, the two structures in the example are equivalent, since packing does not change. which means that the following abuse is perfectly valid in C, and there's certainly much worse abuse to be found in some of the most widely-used libraries.

[EDIT2] No one should ever dare to do any of the following in C++

/* the 3 structures below are 100% binary compatible */
typedef struct _a { int a; double b; char c; }
typedef struct _b { int d; double e; char f; }
typedef struct SOME_STRUCT { int my_i; double my_f; char my_c[1]; }

struct _a a = { 1, 2.5, 'z' };
struct _b b;

/* the following is valid, copy b -> a  */
*(SOME_STRUCT*)&a = *(SOME_STRUCT*)b;
assert((SOME_STRUCT*)&a)->my_c[0] == b.f);
assert(a.c == b.f);

/* more generally these identities are always true. */
assert(sizeof(a) == sizeof(b));
assert(memcmp(&a, &b, sizeof(a)) == 0);
assert(pure_function_requiring_a(&a) == pure_function_requiring_a((_a*)&b));
assert(pure_function_requiring_b((b*)&a) == pure_function_requiring_b(&b));

function_requiring_a_SOME_STRUCT_pointer(&a);  /* may generate a warning, but not all compiler will */
/* etc... the name space abuse is limited to the programmer's imagination */
Michaël Roy
  • 6,338
  • 1
  • 15
  • 19
  • 3
    True in the sense that it is unlikely that the compiler is using a random number generator, but if the C standards don't specify something then there is an element of risk in assuming this. Undefined behavior is undefined even if compilers implicitly define it. If nothing else, it makes your code less portable. – John Coleman Jun 11 '17 at 14:59
  • The default is NOT defining packing for structures, thus letting the compiler decide which will be best for the architecture. An exception to this is when a structure is used across processes, i.e. for communication, os calls, etc..., when the packing should be explicitly defined in the (usually shared) header file. If compilers were not deterministic, programming would be plainly impossible. – Michaël Roy Jun 11 '17 at 15:25
  • 2
    I don't disagree with your answer (I didn't downvote it) but I think that a warning about the dangers of writing nonportable code should be added. – John Coleman Jun 11 '17 at 15:30
  • I understand. My answer is within the context of the question, that is, within the same programming unit. Within this context, I disagree in the sense that actually forcing the packing does not make code portable. Default packing is not the same on 32 bit machines, 64 bit machines and, let's say, an Arduino for pretty good reasons. A simple example, would be that forcing packing to 1 on an purely internal structure, could cause runtime alignment errors on 32bit ARM. – Michaël Roy Jun 11 '17 at 15:41
  • 1
    Downvoted - not because I believe you are wrong, but because simply asserting a position without any references or other justification isn't particularly helpful for the asker. In certain other languages, for example, fields may be arranged so they can be addressed using a suitable hash of the member name, which would obviously differ between the {a,b,c} struct and the {d,e,f} struct. You should explain why a C compiler is not permitted to do the same. – Toby Speight Jun 12 '17 at 08:46
  • To be frank, I would not have answered your comment, if you hadn't mentioned hash, and struct member names. The asker asked for an answer within a very specific context. I don't believe he was asking for a 100 page manual on structure packing, he would have gone to the manula instead. Which I suggest you do,. Any decent C programmer kinows that ANY struct containing in order: an int a double, and a char, can be pointer-cast to ANY struct containing in order: an int a double, and a char, provided the packing is the same for both. Member names don't matter. C libraries don't export structs.. – Michaël Roy Jun 12 '17 at 12:28
  • I very much like your answer. (Even though it says little about the actual question -- the two structs may be perfectly deterministically incompatible.) I do take issue with your comment **"C libraries don't export structs"** however which strikes me as obviously incorrect. One 30+ years old example is [here](http://pubs.opengroup.org/onlinepubs/009696699/basedefs/sys/socket.h.html): **"The header shall define the sockaddr_storage structure."** – Peter - Reinstate Monica Jun 12 '17 at 12:51
  • assert or not, you violated the string aliasing rules. GCC will be happy to screw your code. – David Haim Jun 12 '17 at 13:26
  • keil would not like it either.... :) But I can think of many others would would have let much worse go through. – Michaël Roy Jun 12 '17 at 13:37
  • Peter. The header defines the structs. But they are not mentioned in any way by the object file. Except in the debug information, but that's only there for us humans to read. – Michaël Roy Jun 12 '17 at 13:40
  • I'm not sure what your definition of the word "valid" is but many of us use it in this context to mean "conformant with the C standard". By that definition, your code is not valid. If you mean "will work with common compilers on common platforms", then I suppose the code qualifies, although it may not be best practice since it clearly violates strict aliasing and someday some compiler might find an optimisation incompatible with the intent of the code. (I don't think strict aliasing applies to uses in different TUs, so in those cases you are on less shaky ground. But still...) – rici Jun 12 '17 at 15:17
  • That's why I called my code 'abuse' :) It is definitely not recommended practice, either, even though finding such code is _still_ not uncommon. – Michaël Roy Jun 12 '17 at 15:51
  • The line I was referring to is "the following abuse is perfectly valid." It is not, imho, "valid", never mind "perfectly valid". You can indeed find such code in system libraries but system libraries are entitled to be platform- and even compiler-specific. They should not be used as a guide for programming (and indeed cannot be, because they tend to use reserved names). – rici Jun 12 '17 at 16:05
  • @Peter.... Come to think of it, is a very good start for studying the sort of abuse I describe above... Starting with struct sockaddr_storage, and its accompanying comments. This struct is a real chameleon, and sockets work 1/ because the packing is explicitely defined. 2/ the first element is always the same size. Many other socket-related structs (mis)behave like this. And this flexibility in casting/recasting struct given by C is what makes the magic works. – Michaël Roy Jun 12 '17 at 17:28
  • @rici The aliasing is easily avoided by casting to `char *` in between. The more interesting question is "would a `memcpy` work", and it would probably on almost all platforms. – Peter - Reinstate Monica Jun 12 '17 at 21:43
  • 1
    @Peter: I think that's a misreading of the aliasing rule. (6.5/7) It is not casts which are prohibited; you can cast a pointer to any struct type to a pointer to any other struct type and back again regardless of struct compatibility. What you cannot do is *use* the cast pointer, and it doesn't help that the the pointer has been "sanitized" through a `char*`. If there ends up being a `struct T*` and a `struct U*` and they both point at the same object and you dereference both of them, then you have violated the aliasing rule and the compiler might act as though the pointers were not... – rici Jun 12 '17 at 22:13
  • ...pointing at the same object (that is, it might use its deductions about the contents of a member of the object one of the pointers references without taking into account changes made through the other pointer.) – rici Jun 12 '17 at 22:14
  • @rici Hm. I thought there are two distinct issues here. One is: True, you cannot access an object through a pointer of the wrong type; that's always UB, no matter how you got the pointer. (E.g. somebody passes a `void *` into your function and you cast to the wrong type.) The other question is about aliasing -- is the compiler allowed to assume nobody is accessing the same memory through a second indirection because the second's type doesn't match, and is it consequently allowed to optimize code assuming so. But it seems like both issues are covered in 6.5 – Peter - Reinstate Monica Jun 13 '17 at 05:31