15

I have this very simple code:

#include <stdio.h>
#include <math.h>
int main()
{
    long v = 35;
    double app = (double)v;
    app /= 100;
    app = log10(app);
    printf("Calculated log10 %lf\n", app);
    return 0;
}

This code works perfectly on x86, but doesn't work on arm, on which the result is 0.00000. Some ideas?

Other info:

Operating system: linux 3.2.27

I build arm toolchain with ct-ng: arm-unknown-linux-gnueabi-

libc version 2.13

Output of gcc -v:

Using built-in specs. COLLECT_GCC=arm-unknown-linux-gnueabi-gcc COLLECT_LTO_WRAPPER=/opt/x-tools/arm-unknown-linux-gnueabi/libexec/gcc/arm-unknown-linux-gnueabi/4.5.1/lto-wrapper Target: arm-unknown-linux-gnueabi Configured with: /home/mirko/misc/rasppi-ct-ng-files/.build/src/gcc-4.5.1/configure --build=x86_64-build_unknown-linux-gnu --host=x86_64-build_unknown-linux-gnu --target=arm-unknown-linux-gnueabi --prefix=/opt/x-tools/arm-unknown-linux-gnueabi --with-sysroot=/opt/x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi//sys-root --enable-languages=c --disable-multilib --with-pkgversion=crosstool-NG-1.9.3 --enable-__cxa_atexit --disable-libmudflap --disable-libgomp --disable-libssp --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-gmp=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-mpfr=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-mpc=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-ppl=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-cloog=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-libelf=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --enable-threads=posix --enable-target-optspace --with-local-prefix=/opt/x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi//sys-root --disable-nls --enable-symvers=gnu --enable-c99 --enable-long-long Thread model: posix gcc version 4.5.1 (crosstool-NG-1.9.3)

MirkoBanchi
  • 2,173
  • 5
  • 35
  • 52
  • 2
    The question, and the example code, are incomplete. Missing are: include files and information about target operating system, toolchain and libc version. – unixsmurf Dec 20 '12 at 08:50
  • can you try app /= 100.0f ? – giorashc Dec 20 '12 at 08:50
  • 2
    [log and power function problem](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka8972.html) and [Calculate exp() and log() Without Multiplications](http://www.quinapalus.com/efunc.html) – Grijesh Chauhan Dec 20 '12 at 08:54
  • 1
    which device? it prints the same result as x86 on my arm device "-0.455932". How do you compile it? softfp? – auselen Dec 20 '12 at 08:56
  • Raspberry Pi device is armv6. How do you compile this? Are you running an hardfp system? ARM & FPs is not trivial. – auselen Dec 20 '12 at 09:04
  • 3
    Ok, so we are probably looking at some kind of hardfp/softfp mismatch here between the toolchain and the libc on the target. What Linux distribution are you running? On target primarily, but your build machine would also be useful to know. – unixsmurf Dec 20 '12 at 09:09
  • 1
    On target (raspberry pi) wheezy. On build machine is running Ubuntu 12.04. I tried to compile with -msoft-float and -mhard-float. With the first option builds but again a wrong result. With the second one also the build fails...the compiler says that the object file uses VFP register arguments but executable does not. Maybe this could be useful – MirkoBanchi Dec 20 '12 at 09:25
  • 1
    Can you build and test it with "-march=armv6 -mfpu=vfp -mfloat-abi=hard" – auselen Dec 20 '12 at 09:36
  • @auselen: Same error about VFP register arguments. – MirkoBanchi Dec 20 '12 at 09:38
  • Get a toolchain suitable for Raspberry Pi. Your toolchain doesn't support hardfloat convention in its libraries. It can produce object but can't link. http://stackoverflow.com/questions/13143393/distro-provided-cross-compiler-vs-custom-built-gcc/13147962#13147962 – auselen Dec 20 '12 at 09:44
  • 1
    @MirkoBanchi: if you are on Ubuntu - 'apt-get install arm-linux-gnueabi-gcc' will give you a softfp toolchain, pretty much guaranteed to be compatible with armel Wheezy. – unixsmurf Dec 20 '12 at 09:44
  • @unixsmurf afaik rasppi wheezy is armhf. // I should get one. – auselen Dec 20 '12 at 09:46
  • 1
    @auselen: that would depend on whether it was a raspian wheezy or a vanilla debian wheezy :) If it is the hardfp variant, simply substitute my comment above with 'apt-get install arm-linux-gnueabihf-gcc'. – unixsmurf Dec 20 '12 at 10:50
  • 2
    Raspberry PI provides vfp (vector floating point) unit, make sure your cross compiler tuned to target processor. http://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html – albin Dec 20 '12 at 16:32
  • @albin: No - make sure your cross compiler is tuned to the target's C library. (Also try to make sure your C library supports the full functionality of your target processor.) – unixsmurf Dec 20 '12 at 17:28
  • @unixsmurf: cross compilers uses intrinsic library functions or dedicated fpu registers in order to satisfy mathematical operations . If you do not tune your compilation for your specific target processor, you cannot use proper function or register set to achieve your goal even if your C library supports it. have a look at http://stackoverflow.com/questions/11297965/integer-division-with-cortex-m0-under-rvds – albin Dec 20 '12 at 18:58
  • @albin: yes, and that helps you exactly how when your cross compiler and your target's C library use different ABIs? – unixsmurf Dec 21 '12 at 08:29
  • Can you also add your cross toolchain's 'gcc -v' so we can see how it is actually configured? – auselen Dec 21 '12 at 08:46
  • I edited the question with requested info. However looking at library on Raspi i found that ABI is arm-linux-gnueabihf and cross tool's one is arm-linux-gnueabi. Could This explain? – MirkoBanchi Dec 21 '12 at 10:38
  • Yes, that's the reason. However for explanation, can you look at my updated answer? If it is still confusing tell me which part, then may be I can improve it. – auselen Dec 21 '12 at 11:57
  • @auselen: Your answer explains. Thank you very much. – MirkoBanchi Dec 24 '12 at 21:47
  • @unixsmurf: I got the toolchain through `apt` (apt-get install gcc-arm-linux-gnueabihf). I can build the executable but it segfaults. Also a simple hello-world. Any idea? – MirkoBanchi Jan 15 '13 at 11:07
  • 1
    @MirkoBanchi: gcc-arm-linux-gnueabihf is built with a default architecture of armv7-a. You may need to manually specify -mcpu=arm1176jzf-s or -march=armv6zk. – unixsmurf Jan 15 '13 at 17:52
  • @unixsmurf: I tried both these options but i got this error: `sorry, unimplemented: Thumb-1 hard-float VFP ABI ` any idea? – MirkoBanchi Jan 17 '13 at 08:42
  • 1
    Aah ... it probably defaults to -mthumb too, so try adding -marm as well. – unixsmurf Jan 17 '13 at 11:43
  • Adding -marm i can build correctly but running also a simple hello world it segfaults. Maybe the libc of toolchain is compiled against a too recent kernel? – MirkoBanchi Jan 31 '13 at 09:23

1 Answers1

10

Floating point support on ARM Linux distributions is not trivial. Because of that you should use a toolchain matching your system that is operating system & hardware and use the right compile switches.

First thing you need to understand ARM's calling convention which is about "how arguments are passed when you call a function?". ARM being a RISC architecture, can only work on registers. There are no instructions manipulating memory directly. If you need to change a value in memory you first need to load it to a register, modify it, then you need to store it back on the memory.

When you call a function you may need to pass arguments to it, you can put arguments on stack (memory) but since ARM can only work with registers first thing your function would probably do will be loading them back to registers. To avoid this waste ARM calling convention uses registers to pass arguments. However since ARM has a limited number of registers, calling convention also dictates you to use only first four (r0-r3) registers for the first four arguments, remaining must still use stack to be passed.

Second thing is early ARM cores didn't have any floating point support, operations where implemented in software. (This is what is still supported via gcc's -mfloat-abi=soft.)

We can easily demonstrate what this means via following snippet.

float pi2(float a) {
    return a * 3.14f;
}

Compiling this via -c -O3 -mfloat-abi=soft and obdumping gives us

00000000 <pi2>:
   0:   f24f 51c3   movw    r1, #62915  ; 0xf5c3
   4:   b508        push    {r3, lr}
   6:   f2c4 0148   movt    r1, #16456  ; 0x4048
   a:   f7ff fffe   bl  0 <__aeabi_fmul>
   e:   bd08        pop {r3, pc}

As you can see (actually it is not visible :) ) pi2 gets its parameter in r0, populates pi constant on r1 and uses __aeabi_fmul to multiply those and return result in r0. Since __aeabi_fmul also uses same calling convention, details about r0 is not visible. All our function does to populate r1 and delegate it to __aeabi_fmul.

When floating hardware support added to ARM (again because of architecture style), it came with its own set of registers (s0, s1, ...).

If we compile same snippet with -c -O3 -mfloat-abi=softfp and dump we get

00000000 <pi2>:
   0:   eddf 7a04   vldr    s15, [pc, #16]  ; 14 <pi2+0x14>
   4:   ee07 0a10   vmov    s14, r0
   8:   ee27 7a27   vmul.f32    s14, s14, s15
   c:   ee17 0a10   vmov    r0, s14
  10:   4770        bx  lr
  12:   bf00        nop
  14:   4048f5c3    .word   0x4048f5c3

As you can see now compiler doesn't create a call to __aeabi_fmul but instead it creates a vmul.f32 instruction after it moves argument located in r0 to s14 and populates 3.14 on s15. After multiplication instruction it moves result available in s14 back to r0 since any caller of this function would expect it because of the calling convention.

Now if you think pi2 as a library provided to you by some third party, you can understand that both soft and softfp implementations do the same thing for you and you can use them interchangeably. If system provides them for you, you wouldn't care if your app runs on a system with hardware floating point support or not. This was quite good to keep old software running on new hardware.

However while keeping compability this approach introduces the overhead of moving values between ARM registers and FP registers. This obviously effects performance and addressed by a new calling convention, called hard by gcc. This new convention states that if you have floating point arguments in your function you can utilize floating point registers interleaved with normal ones, as well as you can return floating point values in floating point register s0.

Again if we compile our snippet with -c -O3 -mfloat-abi=hard and dump we get

00000000 <pi2>:
   0:   eddf 7a02   vldr    s15, [pc, #8]   ; c <pi2+0xc>
   4:   ee20 0a27   vmul.f32    s0, s0, s15
   8:   4770        bx  lr
   a:   bf00        nop
   c:   4048f5c3    .word   0x4048f5c3

You can see there is no registers getting moved around. Argument to pi2 gets passed in s0, compiler created code to populate 3.14 in s15 and uses vmul.f32 s0, s0, s15 to get result we want in s0.

Big problem with this new convention is while you improve the code produced by compiler you completely kill compability. You can't expect an application built with hard convention to work with libraries built for soft/softfp and an application built for softfp won't work with libraries built for hard.

For more information on calling conventions you should check ARM's website.

auselen
  • 27,577
  • 7
  • 73
  • 114