0

If there was similar questions please direct me there, I searched quiet some time but didn't find anything.

Backround:

I was just playing around and found some behavior I can't completely explain... For primitive types, it looks like when there's an implicit conversion, the assignment operator = takes longer time, compared to an explicit assignment.

int iTest = 0;
long lMax = std::numeric_limits<long>::max();
for (int i=0; i< 100000; ++i)
{
    // I had 3 such loops, each running 1 of the below lines.
    iTest = lMax;
    iTest = (int)lMax;
    iTest = static_cast<int>(lMax);
}

The result is that the c style cast and c++ style static_cast performs the same on average (differs each time, but no visible difference). AND They both outperforms the implicit assignment.

Result:
iTest=-1, lMax=9223372036854775807
(iTest = lMax) used 276 microseconds

iTest=-1, lMax=9223372036854775807
(iTest = (int)lMax) used 191 microseconds

iTest=-1, lMax=9223372036854775807
(iTest = static_cast<int>(lMax)) used 187 microseconds

Question:

Why is the implicit conversion results in larger latency? I can guess it has to be detected in the assignment that int overflows, so adjusted to -1. But what exactly is going on in the assignment?

Thanks!

Ooops
  • 269
  • 2
  • 12
  • 4
    I would guess your benchmarking is flawed. – Henrik Mar 26 '15 at 08:48
  • Try altering the order of the tests. Maybe it's always the first one that takes longest. Also it looks like 100000 is too small here.Make the tests longer. – TonyK Mar 26 '15 at 09:04

2 Answers2

3

If you want to know why something is happening under the covers, the best place to look is ... wait for it ... under the covers :-)

That means examining the assembler language that is produced by your compiler.

A C++ environment is best thought of as an abstract machine for running C++ code. The standard (mostly) dictates behaviour rather than implementation details. Once you leave the bounds of the standard and start thinking about what happens underneath, the C++ source code is of little help anymore - you need to examine the actual code that the computer is running, the stuff output by the compiler (usually machine code).

It may be that the compiler is throwing away the loop because it's calculating the same thing every time so only needs do it once. It may be that it throws away the code altogether if it can determine you don't use the result.

There was a time many moons ago, when the VAX Fortran compiler (I did say many moons) outperformed its competitors by several orders of magnitude in a given benchmark.

That was for that exact reason. It had determined the results of the loop weren't used so had optimised the entire loop out of existence.


The other thing you might want to watch out for is the measuring tools themselves. When you're talking about durations of 1/10,000th of a second, your results can be swamped by the slightest bit of noise.

There are ways to alleviate these effects such as ensuring the thing you're measuring is substantial (over ten seconds for example), or using statistical methods to smooth out any noise.

But the bottom line is, it may be the measuring methodology causing the results you're seeing.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • "A C++ environment is best thought of as a virtual machine for running C++ code." -- Err, no it is not, because it *is* not... – DevSolar Mar 26 '15 at 08:56
  • 1
    @DevSolar, from C++11 1.9: The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below. – paxdiablo Mar 26 '15 at 09:01
  • Though admittedly I misremembered the quote, so I've changed the answer to use the better phrase "abstract machine". – paxdiablo Mar 26 '15 at 09:02
  • I understand the concept of the standard wording. But while it helps *defining* the language, there is nothing "virtual" or "abstract" about a specific implementation. That is why I have a problem with the VM terminology. Next thing we're calling directly executable machine code "bytecode", and the CPU a JIT compiler... ah, forget it. Have a nice day! ;-) – DevSolar Mar 26 '15 at 09:07
  • 1
    I used that phrase because no implementation was specified. Hence it should be considered a question about C++ itself rather than some implementation of it. I'll try clarify. – paxdiablo Mar 26 '15 at 09:10
  • It is the benchmark I guess. When the execution time hits seconds level, the diff is gone. I need to play with the assembly I guess. BTW -O0 was used so the loop is not optimized by the compiler (at least my compiler), otherwise there wouldn't be the execution time difference. Thanks for the answer! – Ooops Mar 26 '15 at 09:12
  • Thanks for the effort, and +1 to you. – DevSolar Mar 26 '15 at 09:13
3
#include <limits>

int iTest = 0;
long lMax = std::numeric_limits<long>::max();

void foo1()
{
  iTest = lMax;
}

void foo2()
{
  iTest = (int)lMax;
}

void foo3()
{
  iTest = static_cast<int>(lMax);
}

Compiled with GCC 5 using -O3 yields:

__Z4foo1v:
    movq    _lMax(%rip), %rax
    movl    %eax, _iTest(%rip)
    ret

__Z4foo2v:
    movq    _lMax(%rip), %rax
    movl    %eax, _iTest(%rip)
    ret

__Z4foo3v:
    movq    _lMax(%rip), %rax
    movl    %eax, _iTest(%rip)
    ret

They are all exactly the same.

Since you didn't provide a complete example I can only guess that the difference is due to something you aren't showing us.

user657267
  • 20,568
  • 5
  • 58
  • 77