fast small angle sinus/cosinus approximation

Question

I'm doing some rigid-body rotation dynamics simulation, which means I have to compute many rotations by small angle, which has performance bottleneck in evaluation of trigonometric function. Now I do it by Taylor(McLaurin) series:

class double2{
  double x,y;
  // Intristic full sin/cos 
  final void rotate   ( double a){ 
     double x_=x; 
     double ca=Math.cos(a); double sa=Math.sin(a); 
     x=ca*x_-sa*y; y=sa*x_+ca*y; 
  }
  // Taylor 7th-order aproximation
  final void rotate_d7( double a){ 
     double x_=x;
     double a2=a*a;
     double a4=a2*a2;
     double a6=a4*a2;
     double ca= 1.0d - a2  /2.0d + a4  /24.0d  - a6/720.0d;
     double sa=   a  - a2*a/6.0d + a4*a/120.0d - a6*a/5040.0d; 
     x=ca*x_-sa*y; y=sa*x_+ca*y; 
  }
}

but the trade of performance-speed is not so great as I would expect:

                     error(100x dphi=Pi/100 )    time [ns pre rotation]
  v.rotate_d1()   :  -0.010044860504615213    9.314306 ns/op 
  v.rotate_d3()   :   3.2624666136960023E-6  16.268745 ns/op 
  v.rotate_d5()   :  -4.600003294941146E-10  35.433617 ns/op 
  v.rotate_d7()   :   3.416711358283919E-14  49.831547 ns/op 
  v.rotate()      :   3.469446951953614E-16  75.70213  ns/op

Is there any faster method how to evaluate approximation of sin() and cos() for small angle ( like < Pi/100 )

I was thinking maybe some rational series, or continuous fraction approximation? Do you know any? ( Precomputed table doesn't make sense here )

Is there any reason why you want to use the series rather than the sincos() function? Could you also state what language you are using? — camelccc, Aug 09 '13 at 12:35
have a look here http://stackoverflow.com/questions/13460693/using-sincos-in-java — camelccc, Aug 09 '13 at 12:43
why doesn't precomputed table make sense in your situation? memory limitations? reducing precision would limit the number of values you need to store, which is already limited since angle < pi/100 — Graham Griffiths, Aug 09 '13 at 12:47
Have a look at [CORDIC](http://en.wikipedia.org/wiki/CORDIC). — Sven, Aug 09 '13 at 12:52
what language and platform are you running on? it will make a big difference. @Sven - I doubt CORDIC coded up in software will be fast - although it definitely is running on an FPGA. — Graham Griffiths, Aug 09 '13 at 12:59
I'm using java - fast_sincos would be probobaly good, but if I understand it is not in java(?). Table does not make sens because in this case (small angle) the tradeoff accuracy/speed is favorable for power series(according my tests). You have to interpolate anyway, and it is faster to just evaluate cubic polynominal, than evaluate index, get 4 numbers from the table, and than interpolate by cubic. — Prokop Hapala, Aug 09 '13 at 13:17
It's worth noting that the taylor coefficients are not necessarily the ones that minimize error over your range. — Graham Griffiths, Aug 09 '13 at 13:48
if it's worth using double rather than float, you're going to need a ridiculously huge lookup table if you try that. How are you calculating the angle in the first place though? if it's a simulation, I'd look at creating a rotation matrix that doesn't depend on calculating it, then taking sin cos. — camelccc, Aug 09 '13 at 15:58

score 3 · Answer 1 · answered Aug 09 '13 at 12:56

3

You might find that adjusting your calculations can improve performance. E.g.:

const double c7 = -1/5040d;
const double c5 = 1/120d;
const double c3 = -1/6d;

double a2 = a * a;

double sa = (((c7 * a2 + c5) * a2 + c3) * a2 + 1) * a;
// similarly for cos

Now the optimiser might be doing some of this itself anyway, so your mileage may vary. Would be interested to know the results either way.

answered Aug 09 '13 at 12:56

Iridium

23,323
6
52
74

Yes, never do division by a constant. The strict rules of floating point probably preclude the compiler doing algebraic manipulations, though -flags may allow that. And this representation of polynomials is nice too since it reduces operation count and breaks it into MAC, MAC, MAC, MUL. – phkahler Aug 09 '13 at 13:05
OK this I got speedup 2x. – Prokop Hapala Aug 09 '13 at 13:51

score 1 · Answer 2 · answered Aug 09 '13 at 14:53

1

Instead of optimizing the trig functions, see if you can do without them. Rigid-body simulations tend to be a perfectly natural fit for vector math.

answered Aug 09 '13 at 14:53

Jasper Bekkers

6,711
32
46

Yes, I'm using vector math ( or even better complex numbers in 2D and Quaternions in 3D ) but still I wan't to go beyond first order exapnsion (to make time step longer and keep high precision). And the exact solution is rotation matrix composed of sinus and cosisnus. So it is good to expand the exact solution into higher order taylor (or other) series. – Prokop Hapala Aug 14 '13 at 23:04

score 0 · Answer 3 · answered Aug 09 '13 at 12:38

0

Two ways : reduce the precision if possible (as often in video games, use minimal acceptable precision if you aim performance)

the you should try to use tabulated values. Once per execution (when the game loads ?) compute an array of sinus/ cosinus/ that you then access in constant time.

float cosAlpha = COSINUS[(int)(k*alpha)]; // e.g: k = 1000

tune k and the array size to choose angle resolution vs. memory footprint.

edit: Don't forget to use parity of cosinus/sinus functions to avoid duplicate values in the tab edit2: try floats instead of double. Difference will be insignificant for the player, and the performance impact way be interesting. Test it !

answered Aug 09 '13 at 12:38

johan d

2,798
18
26

This is a pretty old-school optimization technique that might turn out to be quite a bit slower on recent hardware if the data doesn't fit in cache it can cost you 100s of cycles more than the actual calculation. Measure. – Jasper Bekkers Aug 09 '13 at 14:52
I don't want very coarse approximation ower the whole period. I want very precise approximation for small angle. I want to improve precission of one movement iteration (rotating vector by some small differential of angle dphi = Omega*dt ) – Prokop Hapala Aug 14 '13 at 23:09

Graham Griffiths · Answer 4 · 2013-08-09T13:55:07.467

can you add some inline assembler? Targetting the i386 'fsincos' instruction is probably the fastest method :

Vector2 unit_vector ( Angle angle ) {
  Vector2 r;

//now the normal processor detection
//and various platform specific vesions

#  if defined (__i386__) && !defined (NO_ASM)
#    if defined __GNUC__
#      define ASM_SINCOS
      asm ("fsincos" : "=t" (r.x), "=u" (r.y) : "0" (angle.radians()));

#    elif defined _MSC_VER
#      define ASM_SINCOS
      double a = angle.radians();
      __asm fld a
      __asm fsincos
      __asm fstp r.x
      __asm fstp r.y
#    endif
#  endif
}

from here. This has the added bonus of calculating both sin and cos in a single call.

EDIT : it's Java.

Are your rotations suitably self-contained that you can offload thousands at a time over JNI? Otherwise this hardware-specific approach is no good.

Pixdigit · Answer 5 · 2016-09-12T17:02:36.457

0

For small x (x<0.2 in radians) you can safely assume sin(x) = x.

The maximum deviation is 0.0013.

edited Sep 12 '16 at 17:02

answered Sep 12 '16 at 16:57

Pixdigit

80
1
11

fast small angle sinus/cosinus approximation

5 Answers5