Context:
I am currently investigating the performance increase between my team's current build environment, which still uses gcc-4.1.2, and a build using gcc-4.8.1. The results were astounding, with a weighted average of at least 25% speedup across a regression. I also threw in another test using gcc-4.4.7, but it only saw a weighted average for about 7% speedup. I had speculated that the large discrepancy was related to the new C++11 copy/move semantics, and since memory transactions were a rather large bottleneck in our program. We make extensive use of STL types, so perhaps the compiler made good use of their new move constructors.
To verify my claims, I picked a test that showed the average performance improvement, and I ran kcachegrind on it for both compilations. The results are posted below, and were not quite what I expected. I should point out a quick and potentially relevant detail. I had to statically compile gcc-4.8.1's libstdc++.so for bureaucracy reasons. This meant that kcachegrind's locations showed some private libraries that I've censored for safety.
Much to my surprise, the amount of calls to memory operations was relatively unchanged (malloc
and _int_malloc
). Another interesting result is the completely absence of memcpy
and the addition of _memcmp_sse4_1
.
Questions:
If I wanted to verify my hypothesis that C++11 semantics are the reason for the improved performance, what should I be looking for in callgrind graphs? Should I find less memory accesses, or should I even find std::string(string&&)
signatures (which I actually don't find here). Keep in mind that this is compiled with -O3
, which might mean that such signatures are optimized out, hence my dilemma.
I am more than happy to report such a large performance increase, but I would like to understand where this performance is coming from. Let me know if more results need to be reported for more definitive answers. I hope this isn't too general of a question for SO...