0

How can I convert the below code to assembly using the SSE instruction set?

for (int &elem : elems){
    int temp = 255 - elem > 0 ? 255 - elem : 0 ;
    results.push_back(temp);
}

I don't want to use intrinsic C++ functions.

I can't understand how to pass elems to assembly, and how to work on multiple values in parallel using the SSE instructions set.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • 5
    just give compiler a hint what kind of architecture of processor it can use: `-march=....` (for gcc/clang) and enable optimizations `-O2`. – Marek R Sep 13 '22 at 13:29
  • 3
    [Demo](https://godbolt.org/z/oYMTafT36) – Marek R Sep 13 '22 at 13:39
  • 2
    @MarekR it only half worked, GCC used AVX2 there but only for one `int` at the time – harold Sep 13 '22 at 15:21
  • 2
    Stop using `.push_back` inside your loop if you want the compiler to vectorize, like we said in comments on [your last question about this](https://stackoverflow.com/questions/73698461/how-to-load-array-elements-in-mmx-or-sse-registers-to-do-sum-operation-on-them). Also, you need `-O3` for full vectorization; `-O2` enables vectorization only in very easy cases. Why does this need to be conditional? I thought your input elements were in the 0..255 range (so you could pack them into 8-bit elements and get 4x the work done per SIMD vector). – Peter Cordes Sep 13 '22 at 16:34

0 Answers0