It would really depend on the processor, and the range of the integer which is better (and using double
would resolve most of the range issues)
For modern "big" CPUs like x86-64 and ARM, integer division and floating point division are roughly the same effort, and converting an integer to a float or vice versa is not a "hard" task (and does the correct rounding directly in that conversion, at least), so most likely the resulting operations are.
atmp = (float) a;
btmp = (float) b;
resfloat = divide atmp/btmp;
return = to_int_with_rounding(resfloat)
About four machine instructions.
On the other hand, your code uses two divides, one modulo and a multiply, which is quite likely longer on such a processor.
tmp = a/b;
tmp1 = a % b;
tmp2 = tmp1 * 2;
tmp3 = tmp2 / b;
tmp4 = tmp + tmp3;
So five instructions, and three of those are "divide" (unless the compiler is clever enough to reuse a / b
for a % b
- but it's still two distinct divides).
Of course, if you are outside the range of number of digits that a float or double can hold without losing digits (23 bits for float, 53 bits for double), then your method MAY be better (assuming there is no overflow in the integer math).
On top of all that, since the first form is used by "everyone", it's the one that the compiler recognises and can optimise.
Obviously, the results depend on both the compiler being used and the processor it runs on, but these are my results from running the code posted above, compiled through clang++
(v3.9-pre-release, pretty close to released 3.8).
round_divide_by_float_casting(): 32.5 ns
round_divide_by_modulo(): 113 ns
divide_by_quotient_comparison(): 80.4 ns
However, the interesting thing I find when I look at the generated code:
xorps %xmm0, %xmm0
cvtsi2ssl 8016(%rsp,%rbp), %xmm0
xorps %xmm1, %xmm1
cvtsi2ssl 4016(%rsp,%rbp), %xmm1
divss %xmm1, %xmm0
callq roundf
cvttss2si %xmm0, %eax
movl %eax, 16(%rsp,%rbp)
addq $4, %rbp
cmpq $4000, %rbp # imm = 0xFA0
jne .LBB0_7
is that the round
is actually a call. Which really surprises me, but explains why on some machines (particularly more recent x86 processors), it is faster.
g++
gives better results with -ffast-math
, which gives around:
round_divide_by_float_casting(): 17.6 ns
round_divide_by_modulo(): 43.1 ns
divide_by_quotient_comparison(): 18.5 ns
(This is with increased count to 100k values)