Why does rounding-to-nearest-with-ties-away-from-zero require more instructions and what is their purpose?

Question

Consider this example, in which various rounding operations (round-up, round-down, round-toward-zero and round-to-nearest-with-ties-to-even) can all be expressed with a single roundsd instruction:

use_floor(double):
        roundsd xmm0, xmm0, 9
        ret
use_ceil(double):
        roundsd xmm0, xmm0, 10
        ret
use_trunc(double):
        roundsd xmm0, xmm0, 11
        ret
use_nearby(double):
        roundsd xmm0, xmm0, 12
        ret

While round-to-nearest-with-ties-away-from-zero requires additional instructions:

use_round(double):
        movapd  xmm1, xmm0
        andpd   xmm0, XMMWORD PTR .LC1[rip]
        orpd    xmm0, XMMWORD PTR .LC0[rip]
        addsd   xmm0, xmm1
        roundsd xmm0, xmm0, 3
        ret

Why does this rounding mode require more instructions on x86 (unlike on Arm) and how do these bit operations on a floating-point value end up implementing the desired semantics?

Why? Because the ISA designers did not include such a rounding mode. It works by isolating the sign bit then applying that to `0.499...` to create the appropriate signed offset which is then added so the round toward zero produces the correct result. It's basically `trunc(x < 0 ? x - 0.499 : x + 0.499)` — Jester, Jan 06 '22 at 13:27
x86 chose to support those four rounding modes and no others, https://www.felixcloutier.com/x86/roundpd#fig-4-24. ARM chose to also support the fifth one. — Nate Eldredge, Jan 07 '22 at 01:57

Why does rounding-to-nearest-with-ties-away-from-zero require more instructions and what is their purpose?

0 Answers0