-1

In NEON inline assembly, after conversion from Signed int32 to Float the number is different.

enter image description here

Here the output for Float and Signed int32 is printed: enter image description here

It differs randomly (not only for each even number). There is only conversion (no any other operation) between save as sint32 and as float.

How to avoid it? Thanks

RanL
  • 139
  • 9

3 Answers3

4

Float has only 23bits assigned to the mantissa, with a separate sign bit (MSB)

Hence any int32 outside of -2^24 ~ 2^24-1 window will lose precision during the conversion. (truncation occurs)

It is nothing ARM/NEON specific.

https://en.wikipedia.org/wiki/Single-precision_floating-point_format

Jake 'Alquimista' LEE
  • 6,197
  • 2
  • 17
  • 25
  • The significands (the preferred term; significands are linear, whereas mantissa is historically logarithmic) have 24 bits. Although only 23 bits are explicitly stored, the 24th bit is inferred from the significand and exponent combined. This makes the range of representable integers from -2\*\*24 to +2\*\*24, not 2\*\*23. – Eric Postpischil Dec 12 '17 at 12:24
  • @EricPostpischil You are right! I'll amend my answer accordingly. – Jake 'Alquimista' LEE Dec 12 '17 at 12:25
  • Some integers outside that range *are* exactly representable. Specifically, integers with enough trailing zeros. i.e. which have only 24 or fewer significant digits in base 2. So it's not quite true that *any* `int32` outside that range loses precision; e.g. any power of 2 (lower than FLT_MAX) can be exactly represented, including values outside the range that `int32_t` can hold. All the `float` values outside that 24-bit range are integers. – Peter Cordes Dec 13 '17 at 05:43
  • And they're probably not truncated toward zero; don't ARM int<->float conversions use the default rounding mode (round to nearest-even)? – Peter Cordes Dec 13 '17 at 05:45
  • @PeterCordes It depends on the configuration. And I've been working mostly on fixed numbers, avoiding float types if possible. – Jake 'Alquimista' LEE Dec 13 '17 at 07:02
1

The significands (fraction portions) of single-precision floating-point numbers are only 24 bits. (23 bits are explicitly stored; 1 is inferred from the exponent and significand combined.) So integers with values above 224 have to be rounded to fit in the floating-point format.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
1

Solved using NEON instruction for conversion to 64bit int and then to 64 bit float.

RanL
  • 139
  • 9
  • Doesn't NEON have conversion directly from packed 32-bit int to packed 64-bit `double` precision float? x86 SSE2 does. – Peter Cordes Dec 13 '17 at 12:13
  • Well NEON doesn't support double-precision at all. But AArch64 Advanced SIMD does, and that's what you're using (`scvtf`). And it looks like it doesn't currently support packed 32-bit int -> `double`, only scalar (like ARM32 VFP). So yes, converting to 64-bit int first is probably a good bet vs. using scalar with one `sshll` instruction. (And `shll2` for the upper half of a 16-byte vector). – Peter Cordes Dec 13 '17 at 13:34
  • I think what you used was actually a VFP instruction. – Jake 'Alquimista' LEE Dec 14 '17 at 08:35