3

I have a std::vector that I need to loop through often. I see two ways of doing it

First way:

const size_t SIZE = myVec.size();
for (size_t i = 0; i < SIZE; i++)
{
    myVec[i] = 0;
}

Second way:

for (size_t i = 0; i < myVec.size(); i++)
{
    myVec[i] = 0;
}

Is the first more efficient than the second, or do modern compilers know to optimize the second implementation to make it as efficient as the first?

FWIW, I am on Visual Studio 2013.

The Vivandiere
  • 3,059
  • 3
  • 28
  • 50
  • 7
    Any decent compiler *should* optimise the second; but the only way to be sure is to measure it. How about `for (auto & x : myVec) x = 0;`? or `std::fill(myVec.begin(), myVec.end(), 0)`? – Mike Seymour Aug 27 '14 at 16:31
  • Thanks, the reason I posted is because I do like to know what the behavior would be on most compilers, not just the one I am using currently. Profiling only tells me about the current compiler – The Vivandiere Aug 27 '14 at 16:34
  • possible duplicate of [Performance issue for vector::size() in a loop](http://stackoverflow.com/questions/3901630/performance-issue-for-vectorsize-in-a-loop) – quantdev Aug 27 '14 at 16:34
  • 1
    @MikeSeymour is right: this is a job for `std::fill` or `std::fill_n`. – Jerry Coffin Aug 27 '14 at 16:38
  • @JerryCoffin, a lot of my loops involve complicated computations, like myVec[i] = a+b*c+d*e/f. Whats a good way to do these? – The Vivandiere Aug 27 '14 at 16:39
  • 1
    @user3670482: In that case you use `std::generate` or `std::generate_n`. `std::generate(myVec.begin(), myVec.end(), [=]{ return a+b*c+d*e/f; });` – Jerry Coffin Aug 27 '14 at 16:40
  • @JerryCoffin, Thanks a lot! What if the a's and b's are vectors themselves? – The Vivandiere Aug 27 '14 at 16:54
  • 1
    @user3670482: Then you'll probably want to capture by reference instead of value: `[&]{/* ... */ }` (and, of course, you can only use syntax that's defined for them, so `a+b` won't work unless you define it somewhere). – Jerry Coffin Aug 27 '14 at 16:58
  • I would not use uppercase identfiers especially on VS – Slava Aug 27 '14 at 17:01
  • 1
    @user3670482 because when you hit a preprocessor macro with the same name you will enjoy spending time breaking your head on cryptic error messages. – Slava Aug 27 '14 at 17:28

5 Answers5

2

The first version will often be faster even with modern compilers. It is difficult for the optimizer to prove that the size does not change due to aliasing with the location written to in the loop body and so in many cases the second version will have to recalculate the size on each loop iteration.

I measured this in Visual Studio 2013 Release and found a performance difference for both 32 and 64 bit code. Both versions are handily beaten by std::fill(). These measurements are averages over 1000 runs with 10 million element vectors (increasing the number of elements to a billion somewhat reduces the performance difference as memory access becomes more of a bottleneck).

Method                   Time relative to uncached for loop
                         x86      x64

uncached for loop        1.00     1.00
cached for loop          0.70     0.98
std::fill()              0.42     0.57

The baseline cached size for loop code:

const auto size = vec.size();
for (vector<int>::size_type i = 0; i < size; ++i) {
    vec[i] = val;
}

Compiles to this loop body (x86 Release):

00B612C0  mov         ecx,dword ptr [esi]  
00B612C2  mov         dword ptr [ecx+eax*4],edi  
00B612C5  inc         eax  
00B612C6  cmp         eax,edx  
00B612C8  jb          forCachedSize+20h (0B612C0h)  

Whereas the version that does not cache the vector's size:

for (vector<int>::size_type i = 0; i < vec.size(); ++i) {
    vec[i] = val;
}

Compiles to this, which recomputes vec.size() every time through the loop:

00B612F0  mov         dword ptr [edx+eax*4],edi  
00B612F3  inc         eax  
00B612F4  mov         ecx,dword ptr [esi+4]            <-- Load vec.end()
00B612F7  mov         edx,dword ptr [esi]              <-- Load vec.begin()
00B612F9  sub         ecx,edx                          <-- ecx = vec.end() - vec.begin()
00B612FB  sar         ecx,2                            <-- exc = (vec.end() - vec.begin()) / sizeof(int)
00B612FE  cmp         eax,ecx  
00B61300  jb          forComputedSize+20h (0B612F0h)  
mattnewport
  • 13,728
  • 2
  • 35
  • 39
1

I prefer writing my loops like the first case. With the second case and std::vector::size(), you might pay for a few extra loads in the compiler optimized version, but when you start working with more complicated data structures, those simple loads can become expensive tree lookups.

Even with preference, the context sometimes requires you to write your loop in the second form. The first case hints that no mutation in the size of the container is occurring since the container size is checked once. When you read the second case, the container size is checked every iteration, which hints to the user that the body could possibly mutate the size of the container.

If you are mutating the container in your loop body, then use the second form and comment that you are mutating your container and want to check its size. Otherwise, prefer the first.

Snowhawk
  • 679
  • 7
  • 9
0

As noted here the complexity of vector<T>::size must be constant on all compilers so it doesn't really matter which one you are using.

Nikola Dimitroff
  • 6,127
  • 2
  • 25
  • 31
  • 5
    Not necessarily, the complexity is constant but it doesn't have to be the same constant as storing it locally. – CashCow Aug 27 '14 at 16:35
0

In any decent modern C++ compiler the two versions won't make any difference in terms of performance because the optimizer would optimize any deteriorations away. Nevertheless, I use the following version:

for (size_t i(0), ie(myVec.size()); i < ie; ++i) {
    // do stuff
}
101010
  • 41,839
  • 11
  • 94
  • 168
  • 1
    +1 but please, use `for (size_t i = 0, sz = myVec.size(); i < sz; ++i)`. The direct initialization and unusual name for the size just require more reading comprehension. I know it's a nitpick, but over a large codebase, these things add up. – TemplateRex Aug 27 '14 at 18:42
  • It's not true that any decent compiler will optimize away the difference. Like any question involving performance, you should really measure before making any statements like this. – mattnewport Sep 10 '14 at 21:50
  • @mattnewport oh it's very very true. It just happens that you don't have a clue about it. – 101010 Sep 10 '14 at 21:55
  • 1
    I took the measurements, and it does make a difference, both to performance and code generation on Visual Studio 2013 in an optimized build. This is also expected - the semantics of the code are different due to possible aliasing of the vector size and many compilers will not be able to optimize away reloading the size on each iteration of the loop. This is what accounts for the difference in performance in Visual Studio. Both versions are also significantly slower than calling std::fill() – mattnewport Sep 10 '14 at 22:25
0

Getting the size of the vector is always constant time.

Where your algorithm might turn out to be less efficient is your use of myVec[i] for each index. It is going to extract the pointer and add 'i' to it every time. Pointer arithmetic is likely to beat it for performance, and if you use the vector's iterator, as that is likely to be implemented as a pointer, it will probably outperform your loop.

If you are setting all the values to 0 you can probably outperform even that with a single function call rather than a loop, in this case

myVec.assign( myVec.size(), 0 );

CashCow
  • 30,981
  • 5
  • 61
  • 92