1

In my recent C++ code; I found Clang generated asm code use instruction movaps to memset the object to 0.

because of this movaps instruction need memory alignment of 16;

and when i use a self allocated buffer to initialize this object, the program was core.

because of the buffer allocated not aligned with 16 bytes.

  1. how clang decide the object's alignment will be 16 bytes ?
  2. how can i hint clang to don't use 16 bytes as the alignment, instead, use 8 bytes ?

example case is :

// a.h
class A {
  A();
  X x;
  Y y;
  Z z;
};
// a.cpp
A::A(): X(), Y(),Z(){}

// main.cpp
int main() {
  char *buf = my_allocator(sizeof(A)); // buf was not aligned by 16bytes. but 8.
  A *a=new(buf)A(); // cored.
  return 1;
}
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Chinaxing
  • 8,054
  • 4
  • 28
  • 36

2 Answers2

2

According to the C++ standard the alignment requirement of an object of a given type is implementation defined, i.e. it is up to the compiler to decide the minimum alignment needed for each type. This usually coincides with requirements set by the platform ABI and the underlying processor architecture. The alignment of a compound type depends on the alignment of its members and sub types.

The standard defines no way to decrease the alignment requirement of an object so to do that you have to look for language extensions in the compiler you are using. In Clang you can use __attribute__((__packed__)) to set the alignment of a struct or class to 1. Note that this can have performance implications as the compiler may need to use extra code or slower instructions to handle the misaligned data.

My suggestion is that you change your allocator to provide memory of the appropriate alignment or always allocate 8 bytes extra and align the pointer before creating each object.

273K
  • 29,503
  • 10
  • 41
  • 64
Johan
  • 3,667
  • 6
  • 20
  • 25
-2

General rule for alignment expectation is that for each architecture there is default and maximum alignment, and anything smaller that must be aligned to boundary of closest smaller exponent of 2.

The possible variable alignments are 1, 2, 4, 8 or 16 bytes.

You take total size of class A and round it down to closest exponent of 2. That gives the alignment Clang expects the variable a has.

Size of any class is sum of sizes of all member variables of the class itself and all parent classes. The order of member variables does matter as the member variables also need to be aligned to closest smaller exponent of 2.

Start of any class on 32-bit platform is aligned to 4 byte boundary, and on 64-bit platform to 8 bytes unless the first member variable requires stricter alignment.

  • Are these the rules for Windows specifically? That's not how it works in general. On non-Windows x86-64 platforms, `alignof(A)` is the `alignof()` of its most aligned member (whether first or not), with no extra alignment required by being a struct. If its members are all char arrays or structs of chars or char arrays, `alignof(A) == 1` so it's legal for an `A*` to point at memory with any alignment; total size of `A` doesn't matter. If its members are `__m128`, then `A` will require/guarantee 16-byte alignment, otherwise not. – Peter Cordes Sep 13 '22 at 16:43
  • Structs and classes behave the same as they are technically implemented identically. Alignment of structure or class isn't the same as what compiler expect it can use to access the member variables... If you access (read or write) two consecutive members, the compiler might require stricter alignment than when accessing just a single member. – Mika Lindqvist Sep 14 '22 at 17:22
  • Alignment expectation of others than the first member of struct or class can be more than the maximum alignment the compiler can use. This is mostly issue with 32-bit platforms, but can happen with vector types even on 64-bit platforms. – Mika Lindqvist Sep 14 '22 at 17:26
  • For a local variable of struct type, yes a compiler might align more than the ABI requires, so it can efficiently use wider loads/stores. But if you have an unknown function that returns an `A*`, the compiler can't assume it's more aligned than `alignof(A)`. The rules for deriving `alignof(A)` depend on the ABI; the rules you describe sound like x86 and/or x64 Windows ([Why is the "alignment" the same on 32-bit and 64-bit systems?](//stackoverflow.com/q/55920103) shows that Windows rules do some aligning of members relative to the start of the struct, even without bumping up `alignof(A)`) – Peter Cordes Sep 14 '22 at 17:31
  • Of course, if the struct does have some `__m128` members, then `alignof(A)` will be 16, and it's the allocator's fault if it returns a pointer with less alignment than that (which appears to be what's happening in the code in the question). Unless you're saying that a struct of 3x `int64_t` gets 16-byte alignment on Windows x64? That's certainly not true in general (e.g. x86-64 System V ABI on Linux/Mac), so your answer should say if it's Windows-specific. – Peter Cordes Sep 14 '22 at 17:35
  • https://godbolt.org/z/zadG4v7bW shows Windows x64 with a class of 3 `long long` variables still has an `alignof() == 8`, so an `A*` that's only aligned by 8 is still legal to dereference. – Peter Cordes Sep 14 '22 at 17:38
  • For any non-first member of struct or class, any compiler is free to assume higher alignment if offset from start of the class or struct is multiple of alignment of first member. This is mostly used when compiler can defer modified data during compilation, for example the values are constants. Strictest ABI is definitely Android ABI that assumes alignment is full size of move, possibly worth 256 bits. For x86, only SSE vector registers require aligned access, AVX registers don't. This obviously only applies to moves between memory and registers, not between two or more registers. – Mika Lindqvist Sep 15 '22 at 22:36
  • `alignof(__m256) == 32`, and clang and GCC do choose to use `vmovaps` for it, requiring 32-byte alignment. The AVX/SSE difference you're talking about is instructions like `addps xmm0, [rdi]` vs. `vaddps xmm0, [rdi]`, where indeed SSE requires alignment for non-mov instructions, AVX doesn't. But both still have alignment-checking aligned `[v]movdqa` vs. making hardware handle misalignment with `[v]movdqu`. – Peter Cordes Sep 15 '22 at 22:43
  • Since the definition of `__m256` in clang's headers doesn't use `__attribute__((aligned(1)))`, using it promises alignment, and compilers other than MSVC/ICC use aligned moves except when folding a memory operand into a memory source for an AVX instruction. All hardware that supports AVX has efficient `[v]movups` that''s just as fast if the data does happen to be aligned, so MSVC/ICC's choice is fine for AVX, only hurting when they use `movups` in code that will run on Core2 / K10 or older. – Peter Cordes Sep 15 '22 at 22:43
  • But anyway, not sure of the relevance of your point about later members happening to be aligned. e.g. `struct { char a[4]; int b; char c[8]; };`. (Except you only talked about the *first* member; in this case the second member is the most-aligned, the one that implies a minimum `alignof` for the struct). Yes, `a[]` and `c[]` are guaranteed to be 4-byte aligned because of being at 4-byte aligned offsets in a struct that itself requires at least 4 byte alignment. But so what? `c[]` isn't aligned by 8 in the x86-64 System V ABI, the whole struct still only requires 4-byte alignment. – Peter Cordes Sep 15 '22 at 22:50
  • Of course if the struct was allocated by an 8-byte aligned allocator, then that instance of it would have more than the minimum alignment required, but it looks like the code in the question has the opposite problem: *less* alignment than the ABI rules allow the compiler to assume. In x86-64 System V, the struct as a whole can only require 16-byte alignment if one of its members has `alignof(x) == 16`. – Peter Cordes Sep 15 '22 at 22:52
  • There is clearly two concepts that need to be considered... One is the alignment of each of the struct or class members, the other is the maximum padding that compiler adds between members without requiring user to add "dummy" member variables to guarantee well-defined member offsets. This obviously excludes packed structs and classes that have no padding at all (alignment is 1 byte boundary only)... – Mika Lindqvist Sep 16 '22 at 23:15
  • This question is about why a compiler would think it could safely use a `movaps` store to a struct, given a pointer to that struct as a return value that doesn't guarantee any over-alignment. That has nothing to do with padding inside the struct. As for *guaranteeing* specific offsets for members, for a known ABI the offsets are fixed by the definition of the struct, you don't need dummy members. You can do stuff like `struct foo{ int x; alignas(8) int a[2]; }` if you want more alignment for some member, and on normal systems with 4-byte int that will require 4 bytes of padding after `x`. – Peter Cordes Sep 17 '22 at 00:07
  • Using `alignas()` might require aligning to less stricter alignment than with dummy members. This I already explained when I said offset of member can't be multiple of alignment of first member. This is obvious especially when 4-byte type is aligned on 8-byte boundary and is followed by another 4-byte type. You basically need to add dummy member between each real member for the compiler to deduce that it can't combine accesses to consecutive members. – Mika Lindqvist Oct 04 '22 at 20:56