15

The literature on computing the elementary function sin with tables refers to the formula:

sin(x) = sin(Cn) * cos(h) + cos(Cn) * sin(h)

where x = Cn + h, Cn is a constant for which sin(Cn) and cos(Cn) have been pre-computed and are available in a table, and, if following Gal's method, Cn has been chosen so that both sin(Cn) and cos(Cn) are closely approximated by floating-point numbers. The quantity h is close to 0.0. An example of reference to this formula is this article (page 7).

I don't understand why this makes sense: cos(h), however it is computed, will probably be wrong by at least 0.5 ULP for some values of h, and since it is close to 1.0, this seems to have a drastic effect on the accuracy of the result sin(x) when computed this way.

I do not understand why the formula below is not used instead:

sin(x) = sin(Cn) + (sin(Cn) * (cos(h) - 1.0) + cos(Cn) * sin(h))

Then the two quantities (cos(h) - 1.0) and sin(h) can be approximated with polynomials that are easy to make accurate as they produce results near zero. The values for sin(Cn) * (cos(h) - 1.0), cos(Cn) * sin(h) and for their sum is still small and its absolute accuracy is expressed in ULPs of the small quantity that the sum represents, so that adding this quantity to sin(Cn) is almost correctly rounded.

Am I missing something that makes the earlier, popular, simpler formula behave well too? Do the writers take it for granted that the readers will understand that the first formula is actually implemented as the second formula?

EDIT: Example

A single-precision table to compute single-precision sinf() and cosf() might contain the following point in single-precision:

         f             |        cos f          |       sin f      
-----------------------+-----------------------+---------------------
0.017967 0x1.2660bcp-6 |    0x1.ffead8p-1      |    0x1.265caep-6
                       |    (actual value:)    |    (actual value:)
                       | ~0x1.ffead8000715dp-1 | ~0x1.265cae000e6f9p-6

The following functions are specialized single-precision functions to use around 0.017967:

float sinf_trad(float x)
{
  float h = x - 0x1.2660bcp-6f;

  return 0x1.265caep-6f * cos_0(h) + 0x1.ffead8p-1f * sin_0(h);
}

float sinf_new(float x)
{
  float h = x - 0x1.2660bcp-6f;

  return 0x1.265caep-6f + (0x1.265caep-6f * cosm1_0(h) + 0x1.ffead8p-1f * sin_0(h));
}

Testing these functions between 0.01f and 0.025f seems to show that the new formula gives more precise results:

$ gcc -std=c99 test.c && ./a.out 
relative error, traditional: 2.169624e-07, new: 1.288049e-07
sum of squares of absolute error, traditional: 6.616202e-12, new: 2.522784e-12

I took several shortcuts so please look at the complete program.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • 2
    Have you had a chance to compare the measured worst case or average error of the classical trigonometric addition formula with your alternate computation? Various additional versions are possible. I recently cooked up the following FMA-friendly version based on the use of a half-angle formula but have not had time for a thorough analysis yet: `s0 = sin(Cn); c0 = cos(Cn); s1 = sin(h); c1 = sin(h/2); c1 = 2*c1*c1; sin(x) = sin(Cn+h) = fma (c0, s1, fma (-s0, c1, s0));` In particular, the order of the FMAs may want to be reversed for optimal results. – njuffa May 18 '14 at 10:35
  • @njuffa I am going to try to do the entire exercise in single-precision, which should save me from having to understand the article I take as example of the formula being proposed as standard. I also noticed in that same article the laconic phrase “the term c0*h being evaluated in an extended precision” (page 10) which sort of alludes to the question I have. – Pascal Cuoq May 18 '14 at 11:42
  • I also don't understand that recommendation. Certainly you'd store `Cn` and `sin(Cn)` in a table. But why not build a single approximation for the residual `sin(Cn) * (cos(h) - 1) + cos(Cn) * sin(h)` instead of individual approximations for `cos(h) - 1` and `sin(h)` like you're doing? – tmyklebu May 18 '14 at 14:05
  • @tmyklebu The answer to your “why not build a single approximation for the residual …” question is “it really depends on the degree of the polynomial approximations you wish to involve”. If you are using low-degree polynomials, then you might as well use a single Horner form. If you are using higher degrees, multiplying all these coefficients by `sin(Cn)` and `cos(Cn)` individually becomes expensive. – Pascal Cuoq May 18 '14 at 19:59
  • @njuffa I am not one for mathematical analysis, so I have done a worst-case analysis by exhaustive testing in my answer below. It **is** possible to make a nice single-precision implementation of `sinf`, it turns out. – Pascal Cuoq May 19 '14 at 17:24
  • I like the idea of defining a function for (1-cos(x)) [for values of x below half a circle], since the magnitudes of the input and output tend to be comparable. If one figures there will generally be some uncertainty in the value of x, having the highest output precision available for those parts of the domain where x has the most uncertainty doesn't seem very helpful. I think many issues are largely moot when using the built-in functions for angles which are not inherently precise multiples of one radian, however, unless one goes to great lengths to accurately multiply... – supercat Jun 12 '14 at 22:03
  • ...one's angles by a perfect representation of pi before using the built-in intrinsics. Perhaps someday someone will standardize on intrinics which compute sin(2pi*x) and either cos(2pi*x) or 1-cos(2pi*x). – supercat Jun 12 '14 at 22:04

3 Answers3

6

The implementation below partially answers the question, in that it is a single-precision implementation of sine, using the formula suggested in the question, that is accurate to 0.53 ULP over [0 … 1.57], and accurate to 0.5 ULP for 99.98% of its arguments over this range.

Specifically, I get the output:

error 285758762/536870912 ULP sin(2.11219326e-01) ref:2.09652290e-01 new:2.09652275e-01 
differences: 176880 / 1070134723

Meaning the error was never more than 285/536 of an ULP (about 0.53 ULP), and 176880 is the number of arguments where the error was above 0.5 ULP, out of a total of 1070134723 arguments.

It does not seem possible to achieve this sort of result with the plain sin(Cn) * cos(h) + cos(Cn) * sin(h) formula and only single-precision computations. The article cited in the question alludes to “the term c0*h being evaluated in an extended precision” in order to achieve overall accuracy.

#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
#include <math.h>
#include <string.h>
#include <stdlib.h>

float c_cos_sin[][3] = {
  //  0x0.000000000p+0 /* 0.000000 */, 0x1.000000p+0, 0x0.000000p+0,
  //  0x0.00fb76590p+2 /* 0.015348 */, 0x1.fff090p-1, 0x1.f6e7a4p-7,
  //  0x0.01fd02f80p+2 /* 0.031068 */, 0x1.ffc0c0p-1, 0x1.fcee02p-6,
  //  0x0.0302f6280p+2 /* 0.047056 */, 0x1.ff6eeap-1, 0x1.8156aap-5,
  //  0x0.04029a400p+2 /* 0.062659 */, 0x1.fefec8p-1, 0x1.007b94p-4,
  //  0x0.0500a9d80p+2 /* 0.078165 */, 0x1.fe6fcap-1, 0x1.3fd706p-4,
  //  0x0.060215b80p+2 /* 0.093877 */, 0x1.fdbedcp-1, 0x1.7ff4e8p-4,
  //  0x0.070225580p+2 /* 0.109506 */, 0x1.fceee8p-1, 0x1.bfa3fcp-4,
  //  0x0.080460e00p+2 /* 0.125267 */, 0x1.fbfcf6p-1, 0x1.ffc0f6p-4,
  //  0x0.08fed4a00p+2 /* 0.140554 */, 0x1.faf372p-1, 0x1.1ee830p-3,
  //  0x0.0a0054100p+2 /* 0.156270 */, 0x1.f9c2d8p-1, 0x1.3ebd74p-3,
  //  0x0.0afc8eb00p+2 /* 0.171665 */, 0x1.f87978p-1, 0x1.5dd872p-3,
  0x0.0bff5db00p+2 /* 0.187461 */, 0x1.f707b0p-1, 0x1.7dad14p-3,
  0x0.0cfe70200p+2 /* 0.203030 */, 0x1.f57bcep-1, 0x1.9cf438p-3,
  0x0.0e024ef00p+2 /* 0.218891 */, 0x1.f3c87ap-1, 0x1.bcb7a0p-3,
  0x0.0efeab400p+2 /* 0.234294 */, 0x1.f202ecp-1, 0x1.db74a8p-3,
  0x0.10003da00p+2 /* 0.250015 */, 0x1.f014d0p-1, 0x1.fab664p-3,
  0x0.110242c00p+2 /* 0.265763 */, 0x1.ee0660p-1, 0x1.0cf2f4p-2,
  0x0.12055d400p+2 /* 0.281577 */, 0x1.ebd62ap-1, 0x1.1c8a4ap-2,
  0x0.13025de00p+2 /* 0.297019 */, 0x1.e994c2p-1, 0x1.2bb212p-2,
  0x0.13fc96600p+2 /* 0.312292 */, 0x1.e73c4ep-1, 0x1.3a9d34p-2,
  0x0.15014c400p+2 /* 0.328204 */, 0x1.e4abbcp-1, 0x1.4a1472p-2,
  0x0.15fe27a00p+2 /* 0.343637 */, 0x1.e210eep-1, 0x1.58fffep-2,
  0x0.1703b1200p+2 /* 0.359600 */, 0x1.df4050p-1, 0x1.685884p-2,
  0x0.180296e00p+2 /* 0.375158 */, 0x1.dc63e8p-1, 0x1.7736b2p-2,
  0x0.18fc8a600p+2 /* 0.390414 */, 0x1.d9790cp-1, 0x1.85b472p-2,
  0x0.19ffac000p+2 /* 0.406230 */, 0x1.d654fap-1, 0x1.94a1ecp-2,
  0x0.1aff07c00p+2 /* 0.421816 */, 0x1.d31f26p-1, 0x1.a33e6ap-2,
  0x0.1c0162800p+2 /* 0.437585 */, 0x1.cfc21ep-1, 0x1.b1ec42p-2,
  0x0.1cfe63200p+2 /* 0.453027 */, 0x1.cc5a50p-1, 0x1.c0317ep-2,
  0x0.1e0153a00p+2 /* 0.468831 */, 0x1.c8c0f4p-1, 0x1.ceb01ep-2,
  0x0.1efe6d800p+2 /* 0.484279 */, 0x1.c52024p-1, 0x1.dcbe7ep-2,
  0x0.1ffde5600p+2 /* 0.499872 */, 0x1.c15a92p-1, 0x1.ead0fcp-2,
  0x0.20fa9ac00p+2 /* 0.515296 */, 0x1.bd83eap-1, 0x1.f89e82p-2,
  0x0.220491000p+2 /* 0.531529 */, 0x1.b95c6cp-1, 0x1.038212p-1,
  0x0.22ff9c800p+2 /* 0.546851 */, 0x1.b55542p-1, 0x1.0a3d7ap-1,
  0x0.23faafc00p+2 /* 0.562176 */, 0x1.b133aep-1, 0x1.10e916p-1,
  0x0.250a2cc00p+2 /* 0.578746 */, 0x1.ac9ed2p-1, 0x1.180d0ep-1,
  0x0.25fee2800p+2 /* 0.593682 */, 0x1.a863d2p-1, 0x1.1e6bdep-1,
  0x0.2700b4000p+2 /* 0.609418 */, 0x1.a3d498p-1, 0x1.251056p-1,
  0x0.28025e000p+2 /* 0.625144 */, 0x1.9f2b7ap-1, 0x1.2ba13ap-1,
  0x0.28f975400p+2 /* 0.640226 */, 0x1.9a9aa0p-1, 0x1.31db54p-1,
  0x0.29fc6dc00p+2 /* 0.656032 */, 0x1.95b7ecp-1, 0x1.384ef4p-1,
  0x0.2afc27c00p+2 /* 0.671640 */, 0x1.90cb6cp-1, 0x1.3e9a4ap-1,
  0x0.2c0659c00p+2 /* 0.687888 */, 0x1.8b90c6p-1, 0x1.45127ap-1,
  0x0.2d017dc00p+2 /* 0.703216 */, 0x1.868952p-1, 0x1.4b18dep-1,
  0x0.2e04f3c00p+2 /* 0.719052 */, 0x1.813e8cp-1, 0x1.513d70p-1,
  0x0.2efcb8800p+2 /* 0.734175 */, 0x1.7c19bcp-1, 0x1.5706f0p-1,
  0x0.300642800p+2 /* 0.750382 */, 0x1.767dc8p-1, 0x1.5d2464p-1,
  0x0.30ff5cc00p+2 /* 0.765586 */, 0x1.7123d0p-1, 0x1.62cb9cp-1,
  0x0.3204f6c00p+2 /* 0.781553 */, 0x1.6b6d98p-1, 0x1.68a4d6p-1,
  0x0.3303af000p+2 /* 0.797100 */, 0x1.65c70cp-1, 0x1.6e4010p-1,
  0x0.34002f400p+2 /* 0.812511 */, 0x1.601740p-1, 0x1.73b86cp-1,
  0x0.35080ac00p+2 /* 0.828616 */, 0x1.5a0f1cp-1, 0x1.79579cp-1,
  0x0.35fda7800p+2 /* 0.843607 */, 0x1.545d16p-1, 0x1.7e7cc6p-1,
  0x0.37040f800p+2 /* 0.859623 */, 0x1.4e31bep-1, 0x1.83e3aep-1,
  0x0.3800eac00p+2 /* 0.875056 */, 0x1.482b1cp-1, 0x1.89002ap-1,
  0x0.390737c00p+2 /* 0.891066 */, 0x1.41d5b8p-1, 0x1.8e3432p-1,
  0x0.39fce7800p+2 /* 0.906061 */, 0x1.3bd3dep-1, 0x1.92fc2ap-1,
  0x0.3b0596c00p+2 /* 0.922216 */, 0x1.3546c4p-1, 0x1.9808d0p-1,
  0x0.3bf971c00p+2 /* 0.937100 */, 0x1.2f2b58p-1, 0x1.9c979ep-1,
  0x0.3d0275800p+2 /* 0.953275 */, 0x1.2874c8p-1, 0x1.a17120p-1,
  0x0.3e02c4400p+2 /* 0.968919 */, 0x1.21e3cap-1, 0x1.a60740p-1,
  0x0.3ef759000p+2 /* 0.983847 */, 0x1.1b8ec4p-1, 0x1.aa4f02p-1,
  0x0.3ff90a800p+2 /* 0.999575 */, 0x1.14d158p-1, 0x1.aeb732p-1,
  0x0.40f703800p+2 /* 1.015077 */, 0x1.0e1baep-1, 0x1.b2f468p-1,
  0x0.420693000p+2 /* 1.031651 */, 0x1.06dcb2p-1, 0x1.b75f2ap-1,
  0x0.4300fb800p+2 /* 1.046935 */, 0x1.001dcep-1, 0x1.bb5678p-1,
  0x0.440282800p+2 /* 1.062653 */, 0x1.f23bb2p-2, 0x1.bf4efcp-1,
  0x0.44fb18000p+2 /* 1.077826 */, 0x1.e49a58p-2, 0x1.c3095ep-1,
  0x0.45fe26000p+2 /* 1.093637 */, 0x1.d647a4p-2, 0x1.c6cfaap-1,
  0x0.4700de800p+2 /* 1.109428 */, 0x1.c7dba0p-2, 0x1.ca77aap-1,
  0x0.47fd2d800p+2 /* 1.124828 */, 0x1.b9af14p-2, 0x1.cdec48p-1,
  0x0.48fd3c000p+2 /* 1.140456 */, 0x1.ab3138p-2, 0x1.d1515ep-1,
  0x0.49f66d000p+2 /* 1.155666 */, 0x1.9cfd2cp-2, 0x1.d48338p-1,
  0x0.4b05ec000p+2 /* 1.172236 */, 0x1.8d67dcp-2, 0x1.d7deb0p-1,
  0x0.4bfebf800p+2 /* 1.187424 */, 0x1.7f0718p-2, 0x1.dad544p-1,
  0x0.4cfa07800p+2 /* 1.202761 */, 0x1.706b10p-2, 0x1.ddb6e0p-1,
  0x0.4e0324800p+2 /* 1.218942 */, 0x1.60e920p-2, 0x1.e0a1e6p-1,
  0x0.4efdb7800p+2 /* 1.234236 */, 0x1.522b24p-2, 0x1.e34658p-1,
  0x0.4ffb51000p+2 /* 1.249714 */, 0x1.432afap-2, 0x1.e5d57ep-1,
  0x0.50fb6d000p+2 /* 1.265346 */, 0x1.33f0b2p-2, 0x1.e84ce2p-1,
  0x0.5200bc000p+2 /* 1.281295 */, 0x1.24536ep-2, 0x1.eab19cp-1,
  0x0.52fbc0800p+2 /* 1.296616 */, 0x1.1541a6p-2, 0x1.ece01ep-1,
  0x0.54066d800p+2 /* 1.312892 */, 0x1.052cfep-2, 0x1.ef1104p-1,
  0x0.550424000p+2 /* 1.328378 */, 0x1.eb9ff8p-3, 0x1.f1077cp-1,
  0x0.55f93c800p+2 /* 1.343337 */, 0x1.cdd470p-3, 0x1.f2cfeap-1,
  0x0.56fa9d000p+2 /* 1.359046 */, 0x1.ae6e42p-3, 0x1.f49074p-1,
  0x0.57ff02000p+2 /* 1.374939 */, 0x1.8e8e2cp-3, 0x1.f63612p-1,
  0x0.58f813000p+2 /* 1.390141 */, 0x1.6ff8f0p-3, 0x1.f7aaf6p-1,
  0x0.5a0036800p+2 /* 1.406263 */, 0x1.4f722cp-3, 0x1.f915dcp-1,
  0x0.5b005b800p+2 /* 1.421897 */, 0x1.2fd214p-3, 0x1.fa55aep-1,
  0x0.5bfe01800p+2 /* 1.437378 */, 0x1.106e24p-3, 0x1.fb732ap-1,
  0x0.5cf89f000p+2 /* 1.452675 */, 0x1.e2b3c0p-4, 0x1.fc6ea8p-1,
  0x0.5e00f4800p+2 /* 1.468808 */, 0x1.a104e8p-4, 0x1.fd56eap-1,
  0x0.5f0527000p+2 /* 1.484689 */, 0x1.60420cp-4, 0x1.fe1a64p-1,
  0x0.5fffc8000p+2 /* 1.499987 */, 0x1.21cb4cp-4, 0x1.feb78ap-1,
  0x0.610318000p+2 /* 1.515814 */, 0x1.c23092p-5, 0x1.ff39eep-1,
  0x0.62032d800p+2 /* 1.531444 */, 0x1.424a9cp-5, 0x1.ff9a86p-1,
  0x0.62fd46000p+2 /* 1.546709 */, 0x1.8a9d8cp-6, 0x1.ffd9fap-1,
  0x0.63fbc1800p+2 /* 1.562241 */, 0x1.1856c2p-7, 0x1.fffb34p-1,
  0x0.65011f000p+2 /* 1.578193 */, -0x1.e4c59ap-8, 0x1.fffc6ap-1,
};

/*@ requires 0 <= x <= 1.6 ; */
float my_sinf(float x)
{
  const float offs = 0x0.0b8p+2f;
  if (x < offs)
    {
      float xx = x * x;
      /* Remez-optimized polynomial for relative accuracy on -0.164 .. 0.164,
         Not the full -0.18 .. 0.18 where it is used, which makes it worse
         on -0.164 .. 0.164. But even optimized without regard for 0.164 .. 0.18
         It is better than the table entry + correction there so we use it there
      */
      return x + x * xx * (-0.16666660487324f + xx * 8.3259065018069e-3f);
    }
  int i = (x - offs) * 64.0f;
  float *p = c_cos_sin[i];
  float F = p[0];
  float C = p[1];
  float S = p[2];
  float h = x - F;
#if 0
  float s = S * (cosl(h) - 1.0) + C * sinl(h); // ext-double computation
#endif
#if 1
  // Two Remez-optimized polynomials for absolute accuracy on -0.008 .. 0.008
  float s =  h * (C + h * (-0.4999976959797f * S + h * -0.166666183241f * C));
#endif
  return S + s; 
}

unsigned int m, c, t;
uint64_t max_ulp;

int main(){
  for (float f = 0.0f; f < 1.57f; f = nextafterf(f, 3.0f))
    {
      double rd = sin(f);
      float r = rd;
      float n = my_sinf(f);
      t++;
      if (r != n)
        {
          c++;
          uint64_t in, ir;
          double nd = n;
          memcpy(&in, &nd, 8);
          memcpy(&ir, &rd, 8);
          uint64_t ulp = in > ir ? in - ir : ir - in;
          if (ulp > max_ulp)
            printf("error %" PRIu64 "/536870912 ULP sin(%.8e) ref:%.8e new:%.8e \n", 
                   ulp, f, r, n);
          if (ulp > max_ulp)
              max_ulp = ulp;
        }
    }
  printf("differences: %u / %u\n", c, t);
}
Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
4

Well, this formula is a start. Then other transformations could be done, depending on the context. I agree that if the formula sin(x) = sin(Cn) * cos(h) + cos(Cn) * sin(h) is applied in the target precision, then the rounding error of sin(Cn) * cos(h) is up to 1/2 ulp of the result, which is bad if the goal is to get an accurate result. However some terms are sometimes expressed in greater precision by using pseudo-expansions. For instance, a number can be represented by a pair (a,b) where b is much smaller than a and whose value is regarded as a+b. In such a case, cos(h) could be represented by a pair (1,h') and the computation would be equivalent to what you suggest.

Alternatively, the implementation can be detailed once the formulas to evaluate cos(h) and sin(h) are given. See Section 3.1 in Stehlé and Zimmermann's paper you cited: they define C*(h) = C(h) − 1, and use C* in the final formula, which is basically what you suggest.

Note: I'm note sure that using the above formula is the best choice. One could start with sin(x) = sin(Cn) + error_term, and compute the error term in some other way.

vinc17
  • 2,829
  • 17
  • 23
-3

You are meeting the boundary between theoretical mathematics and practical numerical calculations.

The trigonometric identity

sin(a + b) = sin(a) * cos( b ) + cos(a) * sin(b)

gives rise to the formula that you quote:

sin(x) = sin(Cn) * cos(h) + cos(Cn) * sin(h)

when you substitute Cn + h for x. This formula is mathematically exact.

However, due to the limitations of practical numeric computation, i.e. floating point arithmetic in your case, we do not have the infinite precision available to numerically calculate such formula exactly. In practice we need to consider the accuracy that we can use to represent the values in the table and what errors are introduced when we perform limited accuracy calculations on these limited accuracy values. The mathematical discipline that deals with practical numerical computation is Numerical Analysis.

There is a very brief summary of Numerical Analysis on Wikipedia with many links to various topics within the subject. I think that you might find Computing values of functions and Interpolation, Extrapolation, and Regression of particular relevance.

Craunch
  • 20
  • 2
  • 2
    My question assumes that the reader understands that floating-point computations are not real computations, but unless the Wikipedia page explicitly mentions a reason to use `sin(Cn) * cos(h) + cos(Cn) * sin(h)` over `sin(Cn) + (sin(Cn) * (cos(h) - 1.0) + cos(Cn) * sin(h))`, then you are not answering my question. You are answering another question along the lines of “Why does this article recommend `sin(Cn) * cos(h) + cos(Cn) * sin(h)`?”, which is very different from my question although it is a prefix of it. – Pascal Cuoq May 18 '14 at 13:39
  • 2
    Note that my question contains results indicating that my formula appears to be more accurate that the standard formula. “This is Numerical Analysis” does not explain why the imprecise formula is considered standard. If anything, the formula that allows near 0.5ULP accuracy without extended precision should be considered standard in Numerical Analysis, should it not? – Pascal Cuoq May 18 '14 at 13:45