Pinpointing performance optimizations between gcc-4.1.2 and gcc-4.8.1

Question

Context:

I am currently investigating the performance increase between my team's current build environment, which still uses gcc-4.1.2, and a build using gcc-4.8.1. The results were astounding, with a weighted average of at least 25% speedup across a regression. I also threw in another test using gcc-4.4.7, but it only saw a weighted average for about 7% speedup. I had speculated that the large discrepancy was related to the new C++11 copy/move semantics, and since memory transactions were a rather large bottleneck in our program. We make extensive use of STL types, so perhaps the compiler made good use of their new move constructors.

To verify my claims, I picked a test that showed the average performance improvement, and I ran kcachegrind on it for both compilations. The results are posted below, and were not quite what I expected. I should point out a quick and potentially relevant detail. I had to statically compile gcc-4.8.1's libstdc++.so for bureaucracy reasons. This meant that kcachegrind's locations showed some private libraries that I've censored for safety.

enter image description here

Much to my surprise, the amount of calls to memory operations was relatively unchanged (malloc and _int_malloc). Another interesting result is the completely absence of memcpy and the addition of _memcmp_sse4_1.

Questions:

If I wanted to verify my hypothesis that C++11 semantics are the reason for the improved performance, what should I be looking for in callgrind graphs? Should I find less memory accesses, or should I even find std::string(string&&) signatures (which I actually don't find here). Keep in mind that this is compiled with -O3, which might mean that such signatures are optimized out, hence my dilemma.

I am more than happy to report such a large performance increase, but I would like to understand where this performance is coming from. Let me know if more results need to be reported for more definitive answers. I hope this isn't too general of a question for SO...

Your assumptions might be right. I can see that you use `std::string` which can benefit from move semantics. But it is very hard to say without any real code. — edmz, Sep 15 '14 at 16:44
Is there a specific signature to show when a move constructor is being used as opposed to a copy constructor though? I'm not seeing any difference in the number of calls to `std::basic_string<...>::ctor` or `std::string::_Rep::_S_create(...)`. — Suedocode, Sep 15 '14 at 17:01
This is beginning to get out of the scope of this question, but gcc-4.1.8 shows far more variety in `std::_Rb_tree` calls, as well as far bigger differences in call frequency (higher for 4.8.1 oddly enough). Perhaps the speedup comes from a better implementation of maps. However, I would like to focus this question on what trends I should look for if I'm trying to find uses of move semantics. I've tried running small tests that use `std::move` to see what they look like in kcachegrind, however they are all either being optimized out or they simply don't show up as `Foo&&` signatures. — Suedocode, Sep 15 '14 at 17:28
Yes, you would see the move constructor/assignment getting called. But, as you've said, they're typically optimized out. (that's something you do want) — edmz, Sep 15 '14 at 17:56

Pinpointing performance optimizations between gcc-4.1.2 and gcc-4.8.1

Context:

Questions:

0 Answers0