4

I want to write a simple scan over an array. I have a std::vector<int> data and I want to find all array indices at which the elements are less than 9 and add them to a result vector. I can write this using a branch:

for (int i = 0; i < data.size(); ++i)
    if (data[i] < 9)
        r.push_back(i);

This gives the correct answer but I would like to compare it to a branchless version.

Using raw arrays - and assuming that data is an int array, length is the number of elements in it, and r is a result array with plenty of room - I can write something like:

int current_write_point = 0;
for (int i = 0; i < length; ++i){
    r[current_write_point] = i;
    current_write_point += (data[i] < 9);
}

How would I get similar behavior using a vector for data?

Aconcagua
  • 24,880
  • 4
  • 34
  • 59
user1794469
  • 228
  • 4
  • 17
  • 2
    `data[i] < 9` is typically a branch at assembly level (although it is surely a better candidate for some `cmov` magic compared to `push_back`, which surely isn't) – Matteo Italia Aug 05 '16 at 23:12
  • How and why is the second solution better than first? – DimChtz Aug 05 '16 at 23:14
  • 3
    I'd expect the `current_write_point +=` line to produce the same code as `if (data[i] < 9) { current_write_point++; }` – Barmar Aug 05 '16 at 23:14
  • 1
    @DimChtz I think that's what he's trying to find out -- he wants to compare the code generated by the two methods. – Barmar Aug 05 '16 at 23:15
  • Have you profiled `std::partition()` followed by `std::copy()` at the split point? – Alejandro Aug 05 '16 at 23:17
  • @DimChtz: if you know in advance the maximum size *and* the compiler is smart enough to kill the potential branch for the <1 it will run quite a bit faster. If you are doing this in an inner loop of a complicated algorithm with a lot of elements it's stuff that can pay off. – Matteo Italia Aug 05 '16 at 23:17

3 Answers3

6

Let's see with the actual compiler output:

auto scan_branch(const std::vector<int>& v)
{
  std::vector<int> res;
  int insert_index = 0;
  for(int i = 0; i < v.size(); ++i)
  {
    if (v[i] < 9)
    {
       res.push_back(i);
    } 
  }
  return res;
}

This code clearly has a branch at 26th line of disassembly. If it's greater than or equal to 9, it just continues with the next element, however in the event of lesser than 9, some horrible amount of code executes for the push_back and we continue. Nothing unexpected.

auto scan_nobranch(const std::vector<int>& v)
{
  std::vector<int> res;
  res.resize(v.size());

  int insert_index = 0;
  for(int i = 0; i < v.size(); ++i)
  {
    res[insert_index] = i;
    insert_index += v[i] < 9;
  }

  res.resize(insert_index);
  return res;
}

This one, however, only has a conditional move, which you can see in the 190th line of the disassembly. It looks like we have a winner. Since conditional move cannot result in pipeline stalls, there are no branches in this one (except the for condition check).

Fatih BAKIR
  • 4,569
  • 1
  • 21
  • 27
0
std::copy_if(std::begin(data), std::end(data), std::back_inserter(r));
Pete Becker
  • 74,985
  • 8
  • 76
  • 165
  • Although this code may help to solve the problem, it doesn't explain _why_ and/or _how_ it answers the question. Providing this additional context would significantly improve its long-term value. Please [edit] your answer to add explanation, including what limitations and assumptions apply. – Toby Speight Aug 09 '16 at 09:32
-2

Well, you could just resize the vector beforehand and keep your algorithm:

// Resize the vector so you can index it normally
r.resize(length);

// Do your algorithm like before
int current_write_point = 0;
for (int i = 0; i < length; ++i){
    r[current_write_point] = i;
    current_write_point += (data[i] < 9);
}

// Afterwards, current_write_point can be used to shrink the vector, so
// there are no excess elements not written to
r.resize(current_write_point + 1);

If you wanted no comparisons though, you can use some bitwise and boolean operations with short-circuiting to determine that.

First, we know that all negative integers are less than 9. Secondly, if it is positive, we can use the bitmask to determine if an integer is in the range 0-15 (actually, we'll check if it's NOT in that range, so greater than 15). Then, we know that if the result of subtracion of 8 from that number is negative, then the result is less than 9: Actually, I just figured a better way. Since we can easily determine if x < 0, we can just subtract x by 9 to determine if x < 9:

#include <iostream>

// Use bitwise operations to determine if x is negative
int n(int x) {
    return x & (1 << 31);
}

int main() {
    int current_write_point = 0;
    for (int i = 0; i < length; ++i){
        r[current_write_point] = i;
        current_write_point += n(data[i] - 9);
    }
}
Franko Leon Tokalić
  • 1,457
  • 3
  • 22
  • 28
  • 2
    Don't forget to use `current_write_point` to shrink the `vector` it afterwards. – Nicol Bolas Aug 05 '16 at 23:24
  • 1
    There are still 2 compares: `(data[i] < 9)` and `i < length`. – Thomas Matthews Aug 05 '16 at 23:28
  • 2
    @ThomasMatthews well, I'm just giving OP a way to use a vector with his algorithm, as he/she wondered how to do that. As far as I understood, OP wants to compare this kind of code to the code with an if – Franko Leon Tokalić Aug 05 '16 at 23:30
  • 2
    @ThomasMatthews comparisons are not branches. – Fatih BAKIR Aug 06 '16 at 00:03
  • 3
    @NicolBolas, but as far as I see, his second snippet doesn't contain any branch, therefore he seems to be aware of that? Even the title says so... – Fatih BAKIR Aug 06 '16 at 00:07
  • 1
    @FatihBAKIR: Sorry, I got confused as to who said what. – Nicol Bolas Aug 06 '16 at 00:08
  • @ThomasMatthews Added version without the comparison with 9 – Franko Leon Tokalić Aug 06 '16 at 00:28
  • @Byteventurer, the problem is not comparison! Those bitwise operations may very well be slower than just comparing directly against 9. – Fatih BAKIR Aug 06 '16 at 00:30
  • @FatihBAKIR sure, but I think it's branchless? Isn't that what we are looking for? Though the comparison one should be branchless too. Just wanted to get rid of some of the downvotes, as I don't know what the reason for them really is – Franko Leon Tokalić Aug 06 '16 at 00:33
  • Calling a function usually requires a branch unless the compiler can optimize it and is told do so. The idea is to remove all branches and function calls. – Thomas Matthews Aug 06 '16 at 17:55
  • @FatihBAKIR: Please translate a compare into assembly language. All the assembly languages I've seen require a compare instruction and a jump (branch) based on the condition code. Thus there is a branch. Take the code in this answer and look at the assembly language. The ARM series is an exception, as their instructions (in Supervisor mode), can conditionally execute. – Thomas Matthews Aug 06 '16 at 17:59
  • @ThomasMatthews Fatih BAKIR did provide a disassembly in his answer – Franko Leon Tokalić Aug 06 '16 at 18:03
  • @ThomasMatthews As for the function call, it can be taken out, and probably is inlined – Franko Leon Tokalić Aug 06 '16 at 18:04