0

I want to calculate Rosenbrock's test function

enter image description here

I have implemented the following C/C++ code

#include <stdio.h>

/********/
/* MAIN */
/********/
int main()
{
    const int N = 900000;

    float *x = (float *)malloc(N * sizeof(float));
    for (int i=0; i<N; i++) x[i] = 3.f;

    float sum_host = 0.f;
    for (int i=0; i<N-1; i++) {
        float temp = (100.f * (x[i+1] - x[i] * x[i]) * (x[i+1] - x[i] * x[i]) + (x[i] - 1.f) * (x[i] - 1.f));
        sum_host = sum_host + temp;
        printf("%i %f %f\n", i, temp, sum_host);
    }
    printf("Result for Rosenbrock's test function calculation = %f\n", sum_host);

}

Since the x array is initialized to 3.f, then each summation term should be 3604.f, so that the final summation involving 899999 terms should be 3243596396. However, the result I get is 3229239296, with an absolute error of 14357100. If I measure the difference between two consecutive partial summations, I see that it is 3600.f for the early partial summations and then it drops to 3584 for the last ones, while it should always be 3604.f.

If I use Kahan summation algorithm as

sum_host = 0.f;
float c        = 0.f;
for (int i=0; i<N-1; i++) {
    float temp = (100.f * (x[i+1] - x[i] * x[i]) * (x[i+1] - x[i] * x[i]) + (x[i] - 1.f) * (x[i] - 1.f)) - c;
    float t    = sum_host + temp;
    c          = (t - sum_host) - temp;
    sum_host = t;
}

the result I get is 3243596288, with a much smaller absolute error of 108.

I'm pretty sure that this effect I see should be ascribed to the precision of floating point arithmetics. Could someone confirm this and provide me an explanation of the mechanism according to which this occurs?

Vitality
  • 20,705
  • 4
  • 108
  • 146
  • It starts at 3604 for me, rather than 3600. Did you make a typo? – tmyklebu Jan 07 '15 at 22:29
  • @tmyklebu That value (`3600`) does not refer to the very first finite difference between consecutive partial summations, but to (about) the `26000`th difference. – Vitality Jan 07 '15 at 22:34
  • @chux That is indeed a typo. Fixed. – Vitality Jan 07 '15 at 22:35
  • @JackOLantern The solution to this problem, in case you missed it in Chux's answer is to replace every instance of the word `float` with the word `double`, and fix the constants as well, e.g. `3.f` becomes `3.0`. The calculation that you're doing won't exceed the precision of a `double`, and hence the results of the calculation will have 0 error. – user3386109 Jan 07 '15 at 23:01
  • @user3386109 Agree with the suggestion about changing `float` to `double` except that the array `*x` may remain `float` for space considerations. Simple way to make `x[i] * x[i]` work at `double` is to do `1.0 * x[i] * x[i]` or `(double) x[i] * x[i]`. – chux - Reinstate Monica Jan 08 '15 at 00:08
  • 1
    @chux To quote some anonymous wise guy, "Premature optimization is the root of all evil." Which is to say that you shouldn't optimize for space unless space is actually a problem. Personally, I've never worked with a computer that could store 900000 floats, but couldn't just as easily store 900000 doubles. (In the good old days, I did work with PCs that couldn't store 900000 doubles, but they couldn't store 900000 floats either.) – user3386109 Jan 08 '15 at 00:25

2 Answers2

4

You compute temp = 3604.0f accurately at each iteration. The problem arises when you try adding 3604.0f to something else and round the result to the nearest float. floats store an exponent and a 23-bit significand, meaning any result with 1-bits more than 24 places apart is going to get rounded to something other than what it is.

Note that 3604 = 901 * 4 and the binary expansion of 901 is 1110000101; you'll start seeing roundoff once you start adding temp to something bigger than 2^24 * 4 = 67108864. (This happens when you run the code, too; it starts printing out 3600 as the difference between consecutive sum_host's right when sum_host exceeds 67108864.) You start seeing even more roundoff when you're adding temp to something bigger than 2^26 * 4; at that point, the second smallest '1' bit is getting swallowed as well.

Note that, after you do Kahan summation, sum_host is what you report AND c is -108. This is loosely because c is keeping track of the next most significant 24 bits.

tmyklebu
  • 13,915
  • 3
  • 28
  • 57
2

Typical float is only good to maybe 7 digits of precision. Repeatedly adding 3604 to a number 100000x larger than it does not well accumulate the lesser significant digits.

Use double.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • I downvoted because I don't think your answer addresses the question to any depth. Downvotes are reversible; if you improve your answer, I'm happy to turn the downvote into an upvote. – tmyklebu Jan 07 '15 at 22:42
  • @tmyklebu I posted, you posted 10 seconds later, you down voted 3 seconds later. Hmmm. This answer is terse and correct. It goes into enough depth to explain the issue and offers a solution. – chux - Reinstate Monica Jan 07 '15 at 23:13