0

I'm trying to use CLang in a large Visual Studio project. There's a lot of MS-specific code, including C++/CLI and MStest that can't be compiled with CLang, so it's a mix of libraries compiled by Microsoft compiler (version 17.2 / VS 2022) and CLang-CL (13.0.2).

Existing code uses AVX to optimize performance-critical bottlenecks, so there are several classes that store aligned data like

struct tx
{
  alignas(32) double m_data[12];
}

The problem is that Microsoft does not always honor alignment requirements. Most of the time it will properly align the data, but sometimes (usually for temporary variables) it will allocate non-aligned structs. For example,

struct edge_object
{
  ...
  tx m_pos;
};

int c = sizeof(edge_object);  // 256
int a = alignof(edge_object); // 32
int b = offsetof(edge_object, tx); // 160

std::vector<edge_object> edges;
for (int i = 0; i < n - 1; ++i)
{
    edges.push_back(edge_object( (edge_id_t)i, test_cost_0, lower_v[i], lower_v[i + 1], tx ));
    edges.push_back(edge_object( (edge_id_t)(n + i), test_cost_0, upper_v[i], upper_v[i + 1], tx ));
}

In this code snippet, MS compiler aligns first temporary edge_object properly (e.g. it will move it 32 bytes if I allocate few additional variables on stack), but it places second temporary edge_object in a totally weird location (at a position shifted 78h bytes off position of first temporary for some reason). MS gets away with this because it always issue unaligned load/store instructions (even if explicitly said to use aligned load/store), so even if object is not aligned, the generated code will still work. CLang, on the other hand, is issuing aligned load instructions. I started by replacing all intrinsics like _mm256_load_ps to _mm256_loadu_ps in my own vectorized code, but sadly Clang is smart enough to issue its own aligned loads when it sees that alignas(32).

So I'm wondering - is there a way to force CLang to issue only unaligned load/stores like MSVC and ICC compilers do? As a potential workaround I can force Clang to do so by changing alignment to 8 instead of 32, but this will hurt performance. MS approach, on the other hand, is almost just as fast when it manages to properly align the data (VMOVUPS and VMOVAPS on modern CPUs have almost same performance for properly aligned addresses) but does not crash when alignment is wrong due to compiler bug. Any suggestions?

Sergei Ozerov
  • 460
  • 5
  • 12
  • Why would you even want that? The better question is: Why isn't clang placing the objects with the right alignment? You should report this as a bug to the compiler. – Goswin von Brederlow Jun 08 '22 at 16:27
  • As a side note: use edges.emplace_back(...) so you don't have a temporary at all. – Goswin von Brederlow Jun 08 '22 at 16:29
  • @GoswinvonBrederlow clang is, msvc isn’t – Taekahn Jun 08 '22 at 16:29
  • @Taekahn Then why does he want clang to use unaligned load? If it alignes data right for aligned load and uses aligned load because they are slightly faster then what is this question about? *How to make the compiler worse?* – Goswin von Brederlow Jun 08 '22 at 16:32
  • @GoswinvonBrederlow in essence, that is how I read it. – Taekahn Jun 08 '22 at 16:32
  • @GoswinvonBrederlow basically the idea is that it's far easier to ask compiler to always issue unaligned load/stores rather than fix all the numerous issues in MS compiler that result in invalid alignment. – Sergei Ozerov Jun 08 '22 at 16:34
  • And how would making clang worse have any affect on what MS compiler does? Are you mixing code compiled by MSVC and clang and MSVC passes wrongly aligned objects to the clang code? Go call MS and get your compiler fixed. You payed for it, use their support. – Goswin von Brederlow Jun 08 '22 at 16:37
  • PS: if your MSVC can't align data then you can't use alignas. You already have a workaround there. (sorry if I sound angry) – Goswin von Brederlow Jun 08 '22 at 16:38
  • @GoswinvonBrederlow everyone would love MS to fix their compiler, but right now it's broken and it's not going to be fixed soon. CLang has been pragmatic and found a way to deal with MS quirks, even of far more complicated topics than this one. Having an optional ability to be "MS-compatible" is not making compiler any worse and this particular ability is very simple. After all if ICC implemented this option, why CLang can't? It does not even affect performance noticeably. – Sergei Ozerov Jun 08 '22 at 16:45
  • Is the problem exclusive to temporary objects? Because otherwise that would mean MSVC and clang/gcc are implementing two incompatible ABIs. And as said, don't use alignas(32) and you get unaligned load/store as you found out yourself. So seems like a solved problem. Personally I would just stop mixing MSVC and clang object files. Why would you compile parts of your binary with different compilers? – Goswin von Brederlow Jun 08 '22 at 16:55
  • Apparently it affects only temporary objects, at least I hasn't seen any issues with ABI (except for SEH handling, but that's a known issue). Object layout (including alignment) is computed by MSVC properly, compiler just fails to allocate some objects according to it. – Sergei Ozerov Jun 08 '22 at 17:03
  • I'd love to drop MSVC code completely but as I already mentioned in my question CLang can't work with C++/CLI code or MSTest framework, so it's not possible to drop MSVC completely without a lot of changes. Previously we used Intel Compiler and while it had its own issues, it worked perfectly as drop-in replacement for MSVC. As far as I understand, CLang-CL aims to achieve same goal – Sergei Ozerov Jun 08 '22 at 17:06

0 Answers0