I have an algorithm in an embedded system that needs to calculate sin(theta), sin(2*theta), sin(3*theta), etc) with Q15 fixed-point arithmetic. sin(theta) and cos(theta) are generated using a LUT/interpolation combo, but I'm using the Chebyshev method to calculate the higher order Sines, which looks like this (pseudo-code):
sinN1 = Comes from Q15 LUT/interp algorithm
cosN1 = Comes from Q15 LUT/interp algorithm
sinN2 = (cosN1*sinN1)>>14;
sinN3 = (cosN1*sinN2)>>14 - sinN1;
sinN4 = (cosN1*sinN3)>>14 - sinN2;
....
The problem is that under certain conditions, this method yields a result which can overflow a Q15 variable. For example, lets consider when theta =2.61697:
sinN1 (Q15) = int(2**15*sin(2.61697)) = 16413
cosN1 (Q15) = int(2**15*cos(2.61697)) = -28361
sinN2 = (-28361*16413)>>14 = -28412 # OK
sinN3 = (-28361*-28412)>>14 - 16413 = 32768 # OVERFLOW BY 1
..
I never seem to overflow by more than an LSB or two. It seems to be an artifcat of compounding quantization. I'm using an ARM Cortex M4 processor, so I can add saturation logic with relatively few instructions, but I'm doing a lot of real-time streaming DSP with very low latency requirements so I need to save as much CPU as I can so I'm wondering if there is a more elegant way to handle this issue.