The first version will often be faster even with modern compilers. It is difficult for the optimizer to prove that the size does not change due to aliasing with the location written to in the loop body and so in many cases the second version will have to recalculate the size on each loop iteration.
I measured this in Visual Studio 2013 Release and found a performance difference for both 32 and 64 bit code. Both versions are handily beaten by std::fill(). These measurements are averages over 1000 runs with 10 million element vectors (increasing the number of elements to a billion somewhat reduces the performance difference as memory access becomes more of a bottleneck).
Method Time relative to uncached for loop
x86 x64
uncached for loop 1.00 1.00
cached for loop 0.70 0.98
std::fill() 0.42 0.57
The baseline cached size for loop code:
const auto size = vec.size();
for (vector<int>::size_type i = 0; i < size; ++i) {
vec[i] = val;
}
Compiles to this loop body (x86 Release):
00B612C0 mov ecx,dword ptr [esi]
00B612C2 mov dword ptr [ecx+eax*4],edi
00B612C5 inc eax
00B612C6 cmp eax,edx
00B612C8 jb forCachedSize+20h (0B612C0h)
Whereas the version that does not cache the vector's size:
for (vector<int>::size_type i = 0; i < vec.size(); ++i) {
vec[i] = val;
}
Compiles to this, which recomputes vec.size() every time through the loop:
00B612F0 mov dword ptr [edx+eax*4],edi
00B612F3 inc eax
00B612F4 mov ecx,dword ptr [esi+4] <-- Load vec.end()
00B612F7 mov edx,dword ptr [esi] <-- Load vec.begin()
00B612F9 sub ecx,edx <-- ecx = vec.end() - vec.begin()
00B612FB sar ecx,2 <-- exc = (vec.end() - vec.begin()) / sizeof(int)
00B612FE cmp eax,ecx
00B61300 jb forComputedSize+20h (0B612F0h)