1

I disassembled an arm binary previously compiled with neon flags:

-mcpu=cortex-a9 -mfpu=neon -mfloat-abi=softfp -ftree-vectorize

The dump shows a vdiv.f64 instruction generated by the compiler. According to the arm manual for armv7 (cortex-a9) neon simd isa does not support vdiv instruction but the floating point (vfp) engine does. Why is this instruction generated? is it then a floating point instruction that will be executed by the vfp? Both neon and VFP support addition and multiplication for floating point so how can I differenciate them from eahc other?

Tamar E. Granor
  • 3,817
  • 1
  • 21
  • 29
  • Yes, this is a VFP instruction. You can easily see this, because AArch32 neon doesn't work on 64-bit floating-point at all. – EOF Jul 27 '16 at 14:08
  • thank you for your answer, but what if I see a "vadd" generated by the compiler, how can I know if it is it a NEON or a VFP instruction since both engines implement this instruction? I am working with an arm cortex-a9 processor and the arm-none-linux-gnueabi* toolchain. – raul garcia Jul 27 '16 at 14:14
  • 2
    neon uses register names `D[n]` and `Q[n]` and instruction-postfixes `F32` (and `I[n]` for integer instructions), VFP uses `S[n]` and `D[n]` and instruction-postfixes `F64` or `F32`. It turns out that the combination is unambiguous. – EOF Jul 27 '16 at 14:18

1 Answers1

2

In the case of Cortex-A9, the NEON FPU option also implements VFP; it is a superset of the cut-down 16-register VFP-only FPU option.

More generally, the architecture does not allow implementing floating-point Advanced SIMD without also implementing at least single-precision VFP, therefore GCC's -mfpu=neon implies VFPv3 as well. It is permissible to implement integer-only Advanced SIMD without any floating-point capability at all, but I'm not sure GCC can support that (or that anyone's ever built such a thing).

The actual VFP and Advanced SIMD variants of instructions are unambiguous from the syntax - anything operating on double-precision data (i.e. <op>.F64) is obviously VFP, as Advanced SIMD doesn't support double-precision. Single precision operations (i.e. <op>.F32) operating on 32-bit s registers are scalar, thus VFP; if they're operating on larger 64-bit d or 128-bit q registers, then they are handling multiple 32-bit values at once, thus are vectorised Advanced SIMD instructions.

Notlikethat
  • 20,095
  • 3
  • 40
  • 77