9

Just 20 minutes age when I answered a question, I come up with an interesting scenario that I'm not sure of the behavior:

Let me have an integer array of size n, pointed by intPtr;

int* intPtr;

and let me also have a struct like this:

typedef struct {
int val1;
int val2;
//and less or more integer declarations goes on like this(not any other type)
}intStruct;

My question is if I do a cast intStruct* structPtr = (intStruct*) intPtr;

Am I sure to get every element correctly if I traverse the elements of the struct? Is there any possibility of miss-alignment(possible because of padding) in any architecture/compiler?

too honest for this site
  • 12,050
  • 4
  • 30
  • 52
Seçkin Savaşçı
  • 3,446
  • 2
  • 23
  • 39
  • 2
    Interesting question! Nowadays most compilers would not introduce any padding in the struct, because the integers are 4-bytes length and array starts are quite surely aligned... But I don't think there is any guarantee anywhere saying that such padding can not exist generally. Thus, in certain hypothetical architectures, I think, the above code can fail. – dsign Aug 27 '12 at 08:08
  • I just tested it with qt Gcc, and the offsets were the same as with an array. This relation held even if some members were declared private, struct was changed to class, or even if one of the ints was changed to an array. As others have pointed out, this is still undefined behavior, so take it with a grain of salt. – Ghost2 Sep 02 '12 at 07:45

6 Answers6

5

The standard is fairly specific that even a POD-struct (which is, I believe the most restrictive class of structs) can have padding between members. ("There might therefore be unnamed padding within a POD-struct object, but not at its beginning, as necessary to achieve appropriate alignment." -- a non-normative note, but still makes the intent quite clear).

For example, contrast the requirements for a standard-layout struct (C++11, §1.8/4):

An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage."

...with those for an array (§8.3.4/1):

An object of array type contains a contiguously allocated non-empty set of N subobjects of type T.

In the array, the elements themselves are required to be allocated contiguously, whereas in the struct, only the storage is required to be contiguous.

The third possibility that might make the "contiguous storage" requirement make more sense would be to consider a struct/class that is not trivially copyable or standard layout. In this case, it's possible that the storage might might not be contiguous at all. For example, an implementation might set aside one area of memory for holding all the private variables, and an entirely separate area of memory to hold all the public variables. To make that a little more concrete, consider two definitions like:

class A { 
    int a;
public:
    int b;
} a;

class B {
    int x;
public:
    int y;
} b;

With these definitions, the memory might be laid out something like:

a.a;
b.x;

// ... somewhere else in memory entirely:

a.b;
b.y;

In this case, neither the elements nor the storage needs to be contiguous, so interleaving parts of entirely separate structs/classes is allowable.

That said, the first element must be at the same address as the struct as a whole (9.2/17): "A pointer to a POD-struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa."

In your case, you have a POD-struct, so (§9.2/17): "A pointer to a POD-struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa." Since the first member must be aligned, and the remaining members are all of the same type, it's impossible for any padding to be truly necessary between the other members (i.e., except for bit-fields, any type you can put in a struct you can also put in an array, where contiguous allocation of the elements is required). If you have elements smaller than a word, on a word-oriented machine (e.g., early DEC Alphas), it's possible that padding could make access somewhat simpler though. For example, early DEC Alphas (at the hardware level) were only capable of reading/writing an entirely (64-bit) word at a time. As such, let's consider something like a struct of four char elements:

struct foo { 
   char a, b, c, d;
};

If it was required to lay these out in memory so they were contiguous, accessing a foo::b (for example) would require that the CPU load the word, then shift it 8-bits right, then mask to zero-extend that byte to fill the entire register.

Storing would be even worse -- the CPU would have to load the current value of the whole word, mask out the current contents of the appropriate char-sized piece of that, shift the new value to the correct place, OR it into the word, and finally store the result.

By contrast, with padding between the elements, each of those becomes a simple load/store, with no shifting, masking, etc.

At least if memory serves, with DEC's normal compiler for the Alpha, int was 32 bits, and long was 64 bits (it predated long long). As such, with your struct of four ints, you could have expected to see another 32 bits of padding between the elements (and another 32 bits after the last element as well).

Given that you do have a POD-struct, you still have some possibilities though. The one I'd probably prefer would be to use offsetof to get the offsets of the members of the struct, create an array of them, and access the members via those offsets. I showed how to do this in a couple of previous answers.

Community
  • 1
  • 1
Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • That architecture would not be conformant; 1.7:3 *Two or more threads of execution can update and access separate memory locations without interfering with each other*. To make such an architecture conformant, `char` would have to be 64 bits. – ecatmur Aug 29 '12 at 15:31
  • @ecatmur: Could be -- obviously DEC was long gone by the time C++11 was approved (and I doubt HP's going to put a lot of work into C++11 compliance for the Alpha). In any case, at least as I read it, "without interfering" still only requires that the modification be surrounded by a critical section (which it probably was -- the Alpha was heavily oriented toward supporting threads). – Jerry Coffin Aug 29 '12 at 15:45
3

Strictly speaking, such pointer casts aren't allowed and lead to undefined behavior.

The main issue with the cast is however that the compiler is free to add any number of padding bytes anywhere inside a struct, except before the very first element. So whether it will work or not depends on the alignment requirements of the specific system, and also whether struct padding is enabled or not.

int is not necessarily of the same size as the optimal size for an addressable chunk of data, even though this is true for most 32-bit systems. Some 32-bitters don't care about misalignment, some will allow misalignment but produce less efficient code, and some must have the data aligned. In theory, 64-bitters may also want to add padding after an int (which will be 32 bit there) to get a 64-bit chunk, but in practice they support 32-bit instruction sets.

If you write code relying on this cast, you should add something like this:

static_assert (sizeof(intStruct) == 
               sizeof(int) + sizeof(int));
Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 2
    Explicit pointer (C-style or `reinterpret_cast`) casts are allowed between any two types of pointer except member pointers. No compiler error anywhere. – Jan Hudec Aug 27 '12 at 08:32
  • As far as I recall, compiler is not allowed to add padding at the beginning of the struct! In C++, only applies to plain-old-data structs. – Jan Hudec Aug 27 '12 at 08:32
  • Casting through `void*` will not make anything better at all. In C++ there are cases where casting through `void*` makes things worse, though this is not one. – Jan Hudec Aug 27 '12 at 08:33
  • @JanHudec Ok so it might not give a compiler error, but it is not "allowed", it is UB. C11 6.3.2.3/7 `A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.` – Lundin Aug 27 '12 at 08:50
  • @JanHudec Read the post again. `except before the very first element.` – Lundin Aug 27 '12 at 08:52
  • 1
    Members of a struct will be aligned as long as the struct itself is and so will elements of an array. So all pointers will always be correctly aligned for their types. The only thing that's not guaranteed is theat there won't be any extra padding. I believe padding is implementation defined, not undefined, so if it works on particular implementation, it's guaranteed to remain working there. – Jan Hudec Aug 27 '12 at 09:54
  • @Lundin: So, per that exception, the cast `intStruct* structPtr = (intStruct*) intPtr;` is in fact allowed and does not lead to UB. No padding, no misalignment. (There's also an explicit guarantee elsewhere, shared with C++, that covers a common initial subsequence). That directly contradicts your first sentence. – MSalters Aug 27 '12 at 09:58
  • @JanHudec: The amount of padding is unspecified. – MSalters Aug 27 '12 at 09:59
3

It is guaranteed to be legal, given that the element type is standard-layout. Note: all references in the following are to the standard.

8.3.4 Arrays [dcl.array]

1 - [...] An object of array type contains a contiguously allocated non-empty set of N subobjects of type T. [...]

Regarding a struct with N members of type T,

9.2 Class members [class.mem]

14 - Nonstatic data members of a (non-union) class with the same access control are allocated so that later members have higher addresses within a class object. [...] Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other [...]
20 - A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member [...] and vice versa. [ Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. —end note ]

So the question is whether any alignment-required padding within a struct could cause its members not to be contiguously allocated with respect to each other. The answer is:

1.8 The C++ object model [intro.object]

4 - [...] An object of trivially copyable or standard-layout type shall occupy contiguous bytes of storage.

In other words, a standard-layout struct a containing at least two members x, y of the same (standard-layout) type that does not respect the identity &a.y == &a.x + 1 is in violation of 1.8:4.

Note that alignment is defined as (3.11 Alignment [basic.align]) the number of bytes between successive addresses at which a given object can be allocated; it follows that alignment of a type T can be no greater than the distance between adjacent objects in an array of T, and (since 5.3.3 Sizeof [expr.sizeof] specifies that the size of an array of n elements is n times the size of an element) alignof(T) can be no greater than sizeof(T). Thus any additional padding between adjacent elements of a struct of the same type would not be required by alignment and so would not be countenanced by 9.2:14.


With regard to AProgrammer's point, I would interpret the language in 26.4 Complex numbers [complex.numbers] as requiring that the instantiations of std::complex<T> should behave as standard-layout types with regard to the position of their members, without being required to conform to all the requirements of standard-layout types.

ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • I can't agree. A standard layout type must use contiguous storage, but that does *not* mean the elements within it must be contiguous. It can still have padding between the elements. – Jerry Coffin Aug 29 '12 at 14:09
  • @JerryCoffin what does *contiguous* mean in 1.8:4, then, and how does that differ from 8.3.4:1? – ecatmur Aug 29 '12 at 14:32
2

The behavior there is almost certainly compiler-, architecture-, and ABI-dependent. However, if you're using gcc, you can make use of __attribute__((packed)) to force the compiler to pack struct members one after the other, without any padding. With that, the memory layout should match that of a flat array.

kelnos
  • 874
  • 5
  • 11
1

I've found nothing which guarantee it is valid when I searched some time ago, and I've found explicit guarantee for the case of std::complex<> in C++ which could have been formulated more easily if it was more generally true, so I doubt I missed something in my search (but absence of proof is hardly a proof of absence and the standard is sometimes obscure in its formulation).

AProgrammer
  • 51,233
  • 8
  • 91
  • 143
1

A typical alignment of C structs guarantees that the data structure members in the struct will be stored sequentially which is the same as a C array. So order cannot be a problem.

As it comes to alignment, since you have only one data type(int), though the compiler is eligible to do so, there is no scenario it would be necessary to add padding to align your data members. The compiler can add padding before the beginning of the struct, but it cannot add padding at the beginning of the data structure. So if the compiler were to add padding in your situation,

Instead of this: [4Byte int][4Byte int][4Byte int]...[4Byte int]

Your data structure would have to be stored like this:
[4Byte Data][4Byte Padding][4Byte Data]... which is unreasonable.

Overall, I think this cast should work with no problems in your situation, though I think it is bad practice to use it.

Asciiom
  • 9,867
  • 7
  • 38
  • 57
hamami
  • 11
  • 1