ARM Cortex-M4F sqrtf uses double

Question

I am working on code for a arm Cortex-M4F with a fpv4-sp-d16 and ABI=hard.

As i was looking with Ozone into my elf file I found that my code uses some double precision functions: eg. __aeabi_dmul

If I don't use sqrtf or use the -ffast-math option, the double support function are no longer linked. Since I do not know what other implications the -ffast-math option introduces, I would like to not use that option.

Are there any other option to get a float sqrt function?

Using ARM-GCC 10 2021.10 with -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -Os -fsingle-precision-constant -Wfloat-equal -Wdouble-promotion

I found in the Disassembly that inside of sqrtf the __aeabi_ddiv is called.

The suggested solution:

float sqrtf_wrap(float x) __attribute__((optimize("fast-math")));
float sqrtf_wrap(float x)
{
    if(x >= 0.0f)
        return __builtin_sqrtf(x);
    else
       return NAN;
}

does not change the build result: __aeabi_ddiv is still used.

Update/Solution

I found a question where one had the same problem: Double division in sqrtf? [UPDATE]

His solution: Directly calling __ieee754_sqrtf works(there is directly the VSQRT.F32 instruction) for me as well.

Further more I found that sqrtf with -fno-math-errno also works .

Unfortunately a wrapper with local optimization options does Not work - seems that the optimization is not applied to the called math function.

Therefore I will use __ieee754_sqrtf instead of sqrtf

Is is possible that you use `double` somewhere else (by accident) and when you remove the `fsqrt()` they are either no longer there or optimized out? — 12431234123412341234123, Jul 20 '23 at 14:42

0___________ · Answer 1 · 2023-07-20T15:07:00.123

0

You need to use sqrtf otherwise it will have to convert it to double and call the sqrt.

-ffast-math is not needed to compile correctly - but it enables more optimizations and breaks strict IEEE conformance (https://simonbyrne.github.io/notes/fastmath/)

Example:

float mysqrtf(float x)
{
    return sqrtf(x);
}

Resulting code:

Without -ffast-math

mysqrtf:
        vcmp.f32        s0, #0
        vmrs    APSR_nzcv, FPSCR
        vmov.f32        s15, s0
        bmi     .L10    //call sqrtf if parameter is negative
        vsqrt.f32       s0, s15
        bx      lr
.L10:
        b       sqrtf

With -ffast-math

mysqrtf:
        vsqrt.f32       s0, s0  //no checks is parameter is negative. If it is it will generate the exception (Invalid Operation or Inexact exceptions)
        bx      lr

Command line options: -O2 -mcpu=cortex-m4 -mthumb <-ffast-math> -mfloat-abi=hard -mfpu=fpv4-sp-d16

edited Jul 20 '23 at 15:07

answered Jul 20 '23 at 14:06

0___________

60,014
4
34
74

1

@artlessnoise no mate. Without fast-math it is calling `sqrtf` only in some edge cases (basically to correctly handle NaNs, set `errno` if needed etc). It answers the OPs question. It eliminates conversion to doubles and in **most** cases uses hardware FPU instruction. I do not see any complaints about linker problems in the OPs question – 0___________ Jul 20 '23 at 15:00
@artlessnoise no again. It is only called if the parameter is wrong (ie negative or NaN). If you pass correct parameter `vsqrt` will be executed. EOD. – 0___________ Jul 20 '23 at 15:30
without a minimal repeatable answer from the OP we cannot describe why they are getting double math it is probably something trivial like 1.234 vs 1.234F. I see this as the answer to the question....thus far. cortex-m4 sqrtf being used without any doubles showing up – old_timer Jul 20 '23 at 17:33

artless noise · Answer 2 · 2023-07-24T14:45:56.147

0

Here is a code snippet which I feel is a good fit to your question.

#include <math.h>

#ifdef __GNUC__
# ifdef __OPTIMIZE__
float sqrtf_wrap(float x) __attribute__((optimize("fast-math")));
# else
float sqrtf_wrap(float x) __attribute__((optimize("1"), optimize("fast-math")));
# endif
float sqrtf_wrap(float x)
{
    if(x >= 0.0f)
      return __builtin_sqrtf(x);
    else 
     return NAN;  
}
#else
 /* ??? link error at least ??? */
#endif

This avoids the call to the internal library 'sqrtf()' which is possibly responsible for the use of double utility functions (such as __aeabi_dmul) needed for a 'single precision' floating point unit.

The wrapper is,

Fairly efficient
Avoids linking sqrtf() and any dependant calls.
It will return 'NAN' when called with in-appropriate arguments.
Does not enable fast-math globally.
errno is not set on abnormal arguments.

It is dependant on gcc features and is not as maintainable as just a standard sqrtf() call. The main benefit is less code linked in the binary. The presence of functions such as __aeabi_dmul often indicate unintended use of doubles, which is important on a single precision FPU for both code density and speed (accepting less precision). That is to say, it can be an important maintenance feature to examine the lack of double support functions in a linked object or map file and so the wrapper might be an necessary evil.

Also, it is beneficial to use -Wdouble-promotion on a CPU like this especially to determine that float constants have a trailing 'f' to specify single precision (or float versus double constant).

edited Jul 24 '23 at 14:45

answered Jul 21 '23 at 14:59

artless noise

21,212
6
68
105

It can be [difficult to get fast-math to apply locally](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50782) but as [godbolt show](https://godbolt.org/z/vhEPsThqq), this works.. – artless noise Jul 21 '23 at 15:23
1. It is not IEEEE.... compliant. 2. It does not set errno 3. wrong comparison. 4. It is only gcc. – 0___________ Jul 21 '23 at 22:19
4 was already stated (but it likely works with clang). I fixed 3. It is unlikely people care about 2, but now noted. In regards to 1, it will return NAN on error cases and worse issues also apply to use of `-ffast-math` and no handling. – artless noise Jul 23 '23 at 15:27
It does not work for me: sqrtf is still linked and with it the double utility functions – Omega Jul 24 '23 at 07:33
@user3653656 Can you use `objdump -S target.o` and post the generated assembler? Alternatively, you can use the godbolt link and update compiler options to exactly as you use to see if it affects things. As the [other link shows](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50782), it can be difficult to get fast math to apply locally as it can affect the calling function as well, which can restrict optimizations. Oh, most definitely you need to turn optimizations on. – artless noise Jul 24 '23 at 13:55

ARM Cortex-M4F sqrtf uses double

2 Answers2