4

I'm fairly new to optimization and I'm having difficulty in getting AVX instructions to work with STL. Here is an example:

std::vector<__m256> v1;
__m256 avx_test_data = _mm256_set_ps(1,2,3,4,5,6,7,8);
v1.push_back(avx_test_data);

This gives a segmentation fault. A similar thing happens when I use an unordered map as well.

Am I doing something obviously incorrect? If not then is there is some way to get them to work together? Thanks in advance.

MathGeek
  • 123
  • 5
  • 4
    I think even with C++17 that should make `new` work for over-aligned types, the allocator for `std::vector<__m256>` doesn't allocate 32-byte-aligned storage. There are several duplicates for this. IDK why C++ introduced `alignas()` without requiring full support for over-aligned types in dynamic allocators; it's a big pain. – Peter Cordes Aug 21 '19 at 18:17
  • Normally you should just write idiomatic/readable C++ code and let the *compiler* worry about optimizing it. – Jesper Juhl Aug 21 '19 at 18:22
  • @JesperJuhl I agree. Unfortunately, I'm currently in a situation (in academia) where we are optimizing for speed as opposed to readability. – MathGeek Aug 21 '19 at 18:55
  • @MathGeek In my experience; the most readable code is often what the compiler is best at optimizing and it usually does a *much* better job at it than humans who try to handoptimize things by writing unreadable stuff. So, I'd say; try just writing the idiomatic C++ version of the code and hand it to a modern compiler with optimizations turned on and see what you get. My guess is that it's going to be faster (in most cases) than your attempt at hand-optimizing it. – Jesper Juhl Aug 21 '19 at 19:02
  • Might look into Boost.Align's [aligned_allocator](https://www.boost.org/doc/libs/1_71_0/doc/html/align/reference.html#align.reference.classes). – Shawn Aug 21 '19 at 19:15
  • 1
    Also duplicates: https://stackoverflow.com/questions/39608172/using-stl-vector-with-simd-intrinsic-data-type (as @PeterCordes said, there are probably more duplicates or related questions). However, with C++17, this should actually be solved: https://godbolt.org/z/o6whaw (unless this is just an extension by g++/clang) – chtz Aug 21 '19 at 23:27
  • @chtz That was it! The pre-written make file was compiling the files with an older version. I can't believe I didn't check it sooner. – MathGeek Aug 21 '19 at 23:54
  • 1
    @MathGeek: note that it's much more common to use `vector` and use `_mm256_loadu_ps()` on top of that. But if you want to force 32-byte alignment to allow `load_ps` instead of `loadu`, you'd need to specify a custom allocator for your `vector` type instead of C++17 doing it for you. TL:DR: normally you only use `__m256` for a few local variables inside a specific function or loop you're manually vectorizing, so it's still easy to write scalar code that touches the same data. (Which might auto-vectorize, and/or for a portable / reference version of a func for testing) – Peter Cordes Aug 22 '19 at 07:17

0 Answers0