5

I have this piece of code to loop over a std::array<int, 3> (see on Compiler Explorer) and find if an element is in the array.

#include <algorithm>
#include <iterator>
#include <array>
constexpr std::array<int, 3> arr = { 0, 1, 2};
bool forLoop(int inp) 
{
    for (int i {0}; i < arr.size(); ++i)
    {
        if (arr[i] == inp) 
        {
            return true;
        }
    }
    return false;
}
bool forEachLoop(int inp)
{
    for (int i : arr) 
    {
        if (i == inp) 
        {
            return true;
        }
    }
    return false;
}
bool STL(int inp)
{
    return std::find(arr.begin(), arr.end(), inp) != arr.end();
}

Compile with x86-64 clang 15.0.0 and -std=c++17 -O3, both forLoop() and forEachLoop() generates:

        cmp     edi, 3
        setb    al
        ret

But STL() generates much different assembly

        test    edi, edi
        je      .LBB2_1
        cmp     edi, 1
        jne     .LBB2_3
        lea     rax, [rip + arr+4]
        lea     rcx, [rip + arr+12]
        cmp     rax, rcx
        setne   al
        ret
.LBB2_1:
        lea     rax, [rip + arr]
        lea     rcx, [rip + arr+12]
        cmp     rax, rcx
        setne   al
        ret
.LBB2_3:
        xor     eax, eax
        cmp     edi, 2
        setne   al
        lea     rcx, [rip + arr]
        lea     rax, [rcx + 4*rax]
        add     rax, 8
        lea     rcx, [rip + arr+12]
        cmp     rax, rcx
        setne   al
        ret

I tried using gcc instead and STL() still generates a much longer assembly

When I tried to change it so that arr has other numbers of elements (eg. 4), both 3 functions generates the same assembly.

So is it a missed optimization problem? Why does it happen only on 3 elements?

  • Not much different here: https://godbolt.org/z/ahc3sxMKb (but same if constexpr is removed) – huseyin tugrul buyukisik Jan 15 '23 at 11:39
  • 3
    Rather than the size of the array, it's the fact that the contents are the set of integers from 0 to N-1, meaning the asm is just doing the equivalent of "is `inp` in [0, n)". Interestingly, making the array `0, 1, 2, 3` causes `STL` to use the same trick. libc++ does it with 3 as well, so I imagine GCC's implementation is doing _something_ four elements at a time and confusing the optimizer when there are only three. – chris Jan 15 '23 at 11:41
  • @chris I agree. GCC's implementation of `std::find` does manual loop-unrolling for 4 elements. That likely cause some differences here. – Homer512 Jan 15 '23 at 12:42

1 Answers1

6

std::find uses std::__find_if in libstdc++, which has a specialization for random access iterators. While the general implementation is a simple loop linearly iterating through the range testing one element at a time, the specialization unrolls the loop into groups of four consecutive element tests and handles remaining elements not fitting any group of four at the end individually. See github mirror.

Apparently the compilers struggle to optimize this specialization. I am not sure whether this specialization of the function is written with the intent to be performant for larger ranges or whether the idea behind it simply doesn't apply any more (looking through the repository, this has been there since 1998), but I also don't know why specifically the compilers struggle to optimize this partially unrolled loop. Maybe it is because the test of the remaining elements at the end of the implementation is dependent on modification of __first in the initial loop (for which I also don't see a good reason btw.).

user17732522
  • 53,019
  • 2
  • 56
  • 105