2

I have an algorithm in an embedded system that needs to calculate sin(theta), sin(2*theta), sin(3*theta), etc) with Q15 fixed-point arithmetic. sin(theta) and cos(theta) are generated using a LUT/interpolation combo, but I'm using the Chebyshev method to calculate the higher order Sines, which looks like this (pseudo-code):

 sinN1 = Comes from Q15 LUT/interp algorithm
 cosN1 = Comes from Q15 LUT/interp algorithm

 sinN2 = (cosN1*sinN1)>>14;
 sinN3 = (cosN1*sinN2)>>14 - sinN1;
 sinN4 = (cosN1*sinN3)>>14 - sinN2;
 ....

The problem is that under certain conditions, this method yields a result which can overflow a Q15 variable. For example, lets consider when theta =2.61697:

 sinN1 (Q15) = int(2**15*sin(2.61697)) = 16413
 cosN1 (Q15) = int(2**15*cos(2.61697)) = -28361
 sinN2 = (-28361*16413)>>14 = -28412            # OK
 sinN3 = (-28361*-28412)>>14 - 16413 = 32768    # OVERFLOW BY 1
 ..

I never seem to overflow by more than an LSB or two. It seems to be an artifcat of compounding quantization. I'm using an ARM Cortex M4 processor, so I can add saturation logic with relatively few instructions, but I'm doing a lot of real-time streaming DSP with very low latency requirements so I need to save as much CPU as I can so I'm wondering if there is a more elegant way to handle this issue.

Darko
  • 589
  • 1
  • 7
  • 18
  • You could probably use unsigned arithmetic as the +/- will be determined by a check on the quadrants. There may be other ways to estimate the function and/or tweak the co-efficients. You didn't give the actual function that is being estimated? – artless noise Mar 10 '15 at 14:35

1 Answers1

0

The mathematician in me wants to suggest keeping an estimate of the accumulated error and correcting for it in a sigma-delta fashion as elegant, but the pragmatic programmer in me says that's insanely impractical.

The SSAT instruction will saturate your result to any position you want, costs a single cycle, and should be easily available via the __ssat intrinsic on any non-rubbish compiler. Since you will inevitably accumulate errors with any non-integer arithmetic, I'm not sure there's really anything better than just doing the calculation and spending one extra cycle making sure it's in range.


What I can't quite work out is if you could get it entirely for free by dabbling in a bit of (inline) assembly to get at the optional shift in ssat that the intrinsic doesn't expose, on the basis that the multiplication/FP correction step is the biggest source of error; something like (pseudo-assembly):

mul  tmp, cosN1, sinN2
ssat tmp, #16, tmp, asr #14
sub  sinN3, tmp, sinN1

Which you can only safely get away with if you can guarantee to never end up with something like sinN3 = 0 - (-32768). QSUB16 is tempting to replace the SUB, but being a parallel operation would do weird stuff with the top halfword of sign bits, and as soon as you add a mask or halfword-packing instruction to correct for that you've lost the "for free" game.

Notlikethat
  • 20,095
  • 3
  • 40
  • 77