3

I am getting a bad_alloc thrown from the code below compiled with gcc (tried 4.9.3, 5.40 and 6.2). gdb tells me it happens on the last line with the initalizer_list for the unordered_map. If I comment out the mmx instruction _m_maskmovq there is no error. Similarly if I comment out the initialization of the unordered_map this is no error. Only when invoking the mmx instruction and initializing the unordered_map with an initializer_list do I get the bad_alloc. If I default construct the unordered_map and call map.emplace(1,1) there is also no error. I've run this on a centos7 machine with 48 cores (intel xeon) and 376 GB RAM and also on a Dell laptop (intel core i7) under Ubuntu WSL with the same result. What is going on here? Is the MMX instruction corrupting the heap? Valgrind didn't seem to identify anything useful.

Compiler command and output:

$g++ -g -std=c++11 main.cpp
$./a.out
   terminate called after throwing an instance of 'std::bad_alloc'
   what():  std::bad_alloc
   Aborted

Source code (main.cpp):

#include <immintrin.h>
#include <unordered_map>

int main()
{
  __m64 a_64 = _mm_set_pi8(0,0,0,0,0,0,0,0);
  __m64 b_64 = _mm_set_pi8(0,0,0,0,0,0,0,0);
  char dest[8] = {0};
  _m_maskmovq(a_64, b_64, dest);

  std::unordered_map<int, int> map{{ 1, 1}};
}

Update: The _mm_empty() workaround does fix this example. This doesn't seem like a viable solution when using multithreaded code where one thread is doing vector instructions and another is using an unordered_map. Another interesting point, if I turn optimization on -O3 the bad_alloc goes away. Fingers crossed we never hit this error during production (cringe).

Eric Roller
  • 429
  • 5
  • 19

1 Answers1

4

There is no heap corruption. This happens because std::unordered_map uses long double internally, for computing the bucket count from the number of elements in the initializer (see _Prime_rehash_policy::_M_bkt_for_elements in the libstdc++ sources).

It is necessary to call _mm_empty before switching from MMX code to FPU code. This has to do with a historic decision to reuse the FPU registers for the MMX register file (sort of the opposite of register renaming in modern CPUs).

The exception goes away if the _mm_empty call is added:

…
  _m_maskmovq(a_64, b_64, dest);
  _mm_empty();
  std::unordered_map<int, int> map{{ 1, 1}};
…

See GCC PR 88998, as identified by cpplearner.

There is ongoing work to implement the MMX intrinsics with SSE on x86-64, which will make this issue disappear because SSE instructions do not affect the FPU state and vice versa.

Florian Weimer
  • 32,022
  • 3
  • 48
  • 92
  • Thank you. I've verified the workaround. Some followup points: - `_m_maskmovq` is listed as an [SSE instruction](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_m_maskmovq&expand=3523), not MMX. This was confusing. - the bad_alloc only happens with initalizer_list. Any idea why? - if I have multithreaded code then I don't see how the _mm_empty will help me? – Eric Roller Feb 11 '19 at 19:08
  • A computation inside the constructor uses `long double`, see the GCC PR I referenced. The web page you references says *CPUID Flags: SSE*, which tells you how to check for support of the instruction. It does not mean that the function is in the SSE set of instructions (and it is not because it operations on MMX registers). – Florian Weimer Feb 11 '19 at 19:14
  • The GCC PR referenced uses the default constructor of unordered_map. Is the code in that PR getting optimized to use the initializer_list constructor then? When I change my code to use the default constructor with subsequent calls to insert, the bad_alloc goes away. – Eric Roller Feb 11 '19 at 19:28
  • `_M_bkt_for_elements` uses `long double` and thus the FPU. Apparently, it is only used with initializer lists. – Florian Weimer Feb 11 '19 at 21:55