3

I am implementing a Filter and I need to optimised as much as possible the implementation. I have realised that there is an instruction that need a lot of cycles and I do not understand why:

bool filters_apply(...)
{
   short sSample;
   double dSample;
   ...
   ...
   sSample = (short) dSample;   //needs a lot of cycles to execute
   ...
   ...
}

I am using de GCC Option: -mcpu=arm926ej-s -mfloat-abi=softfp -mfpu=vfp I have try to compile with the FP ABI "hard" to see if there is difference, but the compiler does not implement it.

Could anyone explain me why that instruction needs so many cycles?

Thanks a lot!!

Alicia R.
  • 53
  • 1
  • 5

1 Answers1

3

Just by looking to the information you've provided, it can be because of the stalls happening when you transfer data from a floating point register to an arm register.

This Debian page on arm floating modes claims, it can take around ~20 cycles for such operation.

Try to use floating point variables as much as possible, for example convert sSample to a float. Your arm926ej-s (vfpv2) should provide 32 single precision (16 double precision) registers.

auselen
  • 27,577
  • 7
  • 73
  • 114
  • I have to transfer later sSample to a short buffer that implements the Frame for the Audio Device. That means soon or later I have to transfer from double to short. – Alicia R. Jun 13 '13 at 13:20