NEON int32 conversion to float gives wrong result

Question

In NEON inline assembly, after conversion from Signed int32 to Float the number is different.

Here the output for Float and Signed int32 is printed:

It differs randomly (not only for each even number). There is only conversion (no any other operation) between save as sint32 and as float.

How to avoid it? Thanks

Jake 'Alquimista' LEE · Answer 1 · 2017-12-12T12:26:42.377

4

Float has only 23bits assigned to the mantissa, with a separate sign bit (MSB)

Hence any int32 outside of -2^24 ~ 2^24-1 window will lose precision during the conversion. (truncation occurs)

It is nothing ARM/NEON specific.

https://en.wikipedia.org/wiki/Single-precision_floating-point_format

edited Dec 12 '17 at 12:26

answered Dec 12 '17 at 12:22

Jake 'Alquimista' LEE

6,197
2
17
25

The significands (the preferred term; significands are linear, whereas mantissa is historically logarithmic) have 24 bits. Although only 23 bits are explicitly stored, the 24th bit is inferred from the significand and exponent combined. This makes the range of representable integers from -2\*\*24 to +2\*\*24, not 2\*\*23. – Eric Postpischil Dec 12 '17 at 12:24
@EricPostpischil You are right! I'll amend my answer accordingly. – Jake 'Alquimista' LEE Dec 12 '17 at 12:25
Some integers outside that range *are* exactly representable. Specifically, integers with enough trailing zeros. i.e. which have only 24 or fewer significant digits in base 2. So it's not quite true that *any* `int32` outside that range loses precision; e.g. any power of 2 (lower than FLT_MAX) can be exactly represented, including values outside the range that `int32_t` can hold. All the `float` values outside that 24-bit range are integers. – Peter Cordes Dec 13 '17 at 05:43
And they're probably not truncated toward zero; don't ARM int<->float conversions use the default rounding mode (round to nearest-even)? – Peter Cordes Dec 13 '17 at 05:45
@PeterCordes It depends on the configuration. And I've been working mostly on fixed numbers, avoiding float types if possible. – Jake 'Alquimista' LEE Dec 13 '17 at 07:02

score 1 · Answer 2 · answered Dec 12 '17 at 12:23

The significands (fraction portions) of single-precision floating-point numbers are only 24 bits. (23 bits are explicitly stored; 1 is inferred from the exponent and significand combined.) So integers with values above 2²⁴ have to be rounded to fit in the floating-point format.

score 1 · Answer 3 · answered Dec 13 '17 at 11:01

1

Solved using NEON instruction for conversion to 64bit int and then to 64 bit float.

answered Dec 13 '17 at 11:01

RanL

139
9

Doesn't NEON have conversion directly from packed 32-bit int to packed 64-bit `double` precision float? x86 SSE2 does. – Peter Cordes Dec 13 '17 at 12:13
Well NEON doesn't support double-precision at all. But AArch64 Advanced SIMD does, and that's what you're using (`scvtf`). And it looks like it doesn't currently support packed 32-bit int -> `double`, only scalar (like ARM32 VFP). So yes, converting to 64-bit int first is probably a good bet vs. using scalar with one `sshll` instruction. (And `shll2` for the upper half of a 16-byte vector). – Peter Cordes Dec 13 '17 at 13:34
I think what you used was actually a VFP instruction. – Jake 'Alquimista' LEE Dec 14 '17 at 08:35

NEON int32 conversion to float gives wrong result

3 Answers3