Numerical accuracy of pow(a/b,x) vs pow(b/a,-x)

Question

Is there a difference in accuracy between pow(a/b,x) and pow(b/a,-x)? If there is, does raising a number less than 1 to a positive power or a number greater than 1 to a negative power produce more accurate result?

Edit: Let's assume x86_64 processor and gcc compiler.

Edit: I tried comparing using some random numbers. For example:

printf("%.20f",pow(8.72138221/1.761329479,-1.51231)) // 0.08898783049228660424
printf("%.20f",pow(1.761329479/8.72138221, 1.51231)) // 0.08898783049228659037

So, it looks like there is a difference (albeit minuscule in this case), but maybe someone who knows about the algorithm implementation could comment on what the maximum difference is, and under what conditions.

Test them and see? It probably depends a lot on the implementation of `pow()` being used. — Shawn, Apr 09 '19 at 06:31
You'd have to provide your architecture, at least, for anyone to be able to answer that. — xaxxon, Apr 09 '19 at 06:31
My guess is that it doesn't matter much, the representation has about the same accuracy in both directions. But the standard doesn't give you hard guarantees for `pow`, as it is not one of the standard operations that must be accurate. — starblue, Apr 09 '19 at 06:52
Please don't use `n` for non-integers, that is misleading. Often different algorithms are used for integer and non-integer exponents. — starblue, Apr 09 '19 at 06:54
The difference might not be entirely due to `pow` because divisions are done beforehand (perhaps by the compiler), which is the first opportunity for accuracy. — Weather Vane, Apr 09 '19 at 07:10
Note that your input numbers are not representable with IEEE 754 double-precision floating point type. Do you care about the accuracy with respect to the exact calculation? https://www.wolframalpha.com/input/?i=(1.761329479+%2F+8.72138221)%5E1.51231 — Daniel Langr, Apr 09 '19 at 13:58
@starblue It doesn't have about the same accuracy in both directions (for `double`), check my answer. For one direction, the error was about twice as big, which might matter in some applications. However, agree that this is implementation-dependent. — Daniel Langr, Apr 09 '19 at 14:07
I'm surprised that no one is including `pow(a, n) * pow(b, -n)` in the comparison. — Ben Voigt, Apr 13 '19 at 01:42

geza · Answer 1 · 2019-04-09T07:24:14.720

Here's one way to answer such questions, to see how floating-point behaves. This is not a 100% correct way to analyze such question, but it gives a general idea.

Let's generate random numbers. Calculate v0=pow(a/b, n) and v1=pow(b/a, -n) in float precision. And calculate ref=pow(a/b, n) in double precision, and round it to float. We use ref as a reference value (we suppose that double has much more precision than float, so we can trust that ref can be considered the best possible value. This is true for IEEE-754 for most of the time). Then sum the difference between v0-ref and v1-ref. The difference should calculated with "the number of floating point numbers between v and ref".

Note, that the results may be depend on the range of a, b and n (and on the random generator quality. If it's really bad, it may give a biased result). Here, I've used a=[0..1], b=[0..1] and n=[-2..2]. Furthermore, this answer supposes that the algorithm of float/double division/pow is the same kind, have the same characteristics.

For my computer, the summed differences are: 2604828 2603684, it means that there is no significant precision difference between the two.

Here's the code (note, this code supposes IEEE-754 arithmetic):

#include <cmath>
#include <stdio.h>
#include <string.h>

long long int diff(float a, float b) {
    unsigned int ai, bi;
    memcpy(&ai, &a, 4);
    memcpy(&bi, &b, 4);
    long long int diff = (long long int)ai - bi;
    if (diff<0) diff = -diff;
    return diff;
}

int main() {
    long long int e0 = 0;
    long long int e1 = 0;
    for (int i=0; i<10000000; i++) {
        float a = 1.0f*rand()/RAND_MAX;
        float b = 1.0f*rand()/RAND_MAX;
        float n = 4.0f*rand()/RAND_MAX - 2.0f;

        if (a==0||b==0) continue;

        float v0 = std::pow(a/b, n);
        float v1 = std::pow(b/a, -n);
        float ref = std::pow((double)a/b, n);

        e0 += diff(ref, v0);
        e1 += diff(ref, v1);
    }

    printf("%lld %lld\n", e0, e1);
}

Eric Postpischil · Answer 2 · 2019-04-13T01:32:48.143

In general, the form with the positive power is slightly better, although by so little it will likely have no practical effect. Specific cases could be distinguished. For example, if either a or b is a power of two, it ought to be used as the denominator, as the division then has no rounding error.

In this answer, I assume IEEE-754 binary floating-point with round-to-nearest-ties-to-even and that the values involved are in the normal range of the floating-point format.

Given a, b, and x with values a, b, and x, and an implementation of pow that computes the representable value nearest the ideal mathematical value (actual implementations are generally not this good), pow(a/b, x) computes (a/b•(1+e₀))^x•(1+e₁), where e₀ is the rounding error that occurs in the division and e₁ is the rounding error that occurs in the pow, and pow(b/a, -x) computes (b/a•(1+e₂))^−x•(1+e₃), where e₂ and e₃ are the rounding errors in this division and this pow, respectively.

Each of the errors, e₀…e₃ lies in the interval [−u/2, u/2], where u is the unit of least precision (ULP) of 1 in the floating-point format. (The notation [p, q] is the interval containing all values from p to q, including p and q.) In case a result is near the edge of a binade (where the floating-point exponent changes and the significand is near 1), the lower bound may be −u/4. At this time, I will not analyze this case.

Rewriting, these are (a/b)^x•(1+e₀)^x•(1+e₁) and (a/b)^x•(1+e₂)^−x•(1+e₃). This reveals the primary difference is in (1+e₀)^x versus (1+e₂)^−x. The 1+e₁ versus 1+e₃ is also a difference, but this is just the final rounding. [I may consider further analysis of this later but omit it for now.]

Consider (1+e₀)^x and (1+e₂)^−x.The potential values of the first expression span [(1−u/2)^x, (1+u/2)^x], while the second spans [(1+u/2)^−x, (1−u/2)^−x]. When x > 0, the second interval is longer than the first:

The length of the first is (1+u/2)^x−(1+u/2)^x.
The length of the second is (1/(1−u/2))^x−(1/(1+u/2))^x.
Multiplying the latter by (1−u²/2²)^x produces ((1−u²/2²)/(1−u/2))^x−( (1−u²/2²)/(1+u/2))^x = (1+u/2)^x−(1+u/2)^x, which is the length of the first interval.
1−u²/2² < 1, so (1−u²/2²)^x < 1 for positive x.
Since the first length equals the second length times a number less than one, the first interval is shorter.

Thus, the form in which the exponent is positive is better in the sense that it has a shorter interval of potential results.

Nonetheless, this difference is very slight. I would not be surprised if it were unobservable in practice. Also, one might be concerned with the probability distribution of errors rather than the range of potential errors. I suspect this would also favor positive exponents.

chux - Reinstate Monica · Answer 3 · 2019-04-09T13:26:15.710

2

... between pow(a/b,x) and pow(b/a,-x) ... does raising a number less than 1 to a positive power or a number greater than 1 to a negative power produce more accurate result?

Whichever division is more arcuate.

Consider z = x^y = 2^{y * log2(x)}.

Roughly: The error in y * log2(x) is magnified by the value of z to form the error in z. x^y is very sensitive to the error in x. The larger the |log2(x)|, the greater concern.

In OP's case, both pow(a/b,p) and pow(b/a,-p), in general, have the same y * log2(x) and same z and similar errors in z. It is a question of how x, y are formed:

a/b and b/a, in general, both have the same error of +/- 0.5*unit in the last place and so both approaches are of similar error.

Yet with select values of a/b vs. b/a, one quotient will be more exact and it is that approach with the lower pow() error.

pow(7777777/4,-p) can be expected to be more accurate than pow(4/7777777,p).

Lacking assurance about the error in the division, the general rule applies: no major difference.

edited Apr 09 '19 at 13:26

answered Apr 09 '19 at 12:47

chux - Reinstate Monica

143,097
13
135
256

Barring a case like one of the numbers is exactly a power of two, how would you know which division is more accurate? What code would you write for this? – Eric Postpischil Apr 10 '19 at 01:09
1

@EricPostpischil Perhaps some comparison involving `fmod(a,b)` and `fmod(b,a)`. – chux - Reinstate Monica Apr 10 '19 at 01:40
That’s a start. Except it gives you information on the absolute fraction part of the quotient. We would need to know the fraction that occurs at the point of rounding. Which seems unrelated. – Eric Postpischil Apr 10 '19 at 01:51
@EricPostpischil Hmmm, I think it might be as simple as the lesser of the `fabs(log2(fmod(a,b))), fabs(log2(fmod(b,a)))`. I'll dig deeper later: tax season right now ;-( – chux - Reinstate Monica Apr 10 '19 at 03:25

Daniel Langr · Answer 4 · 2019-04-09T14:06:24.950

For evaluation of rounding errors like in your case, it might be useful to use some multi-precision library, such as Boost.Multiprecision. Then, you can compare results for various precisions, e.g, such as with the following program:

#include <iomanip>
#include <iostream>
#include <boost/multiprecision/cpp_bin_float.hpp>
#include <boost/multiprecision/cpp_dec_float.hpp>

namespace mp = boost::multiprecision;

template <typename FLOAT>
void comp() {
  FLOAT a = 8.72138221;
  FLOAT b = 1.761329479;
  FLOAT c = 1.51231;

  FLOAT e = mp::pow(a / b, -c);
  FLOAT f = mp::pow(b / a, c);

  std::cout << std::fixed << std::setw(40) << std::setprecision(40) << e << std::endl;
  std::cout << std::fixed << std::setw(40) << std::setprecision(40) << f << std::endl;
}

int main() {
  std::cout << "Double: " << std::endl;
  comp<mp::cpp_bin_float_double>();
  td::cout << std::endl;

  std::cout << "Double extended: " << std::endl;
  comp<mp::cpp_bin_float_double_extended>();
  std::cout << std::endl;

  std::cout << "Quad: " << std::endl;
  comp<mp::cpp_bin_float_quad>();
  std::cout << std::endl;

  std::cout << "Dec-100: " << std::endl;
  comp<mp::cpp_dec_float_100>();
  std::cout << std::endl;
}

Its output reads, on my platform:

Double: 
0.0889878304922865903670015086390776559711
0.0889878304922866181225771242679911665618

Double extended: 
0.0889878304922865999079806265115166752366
0.0889878304922865999012043629334822725241

Quad: 
0.0889878304922865999004910375213273866639
0.0889878304922865999004910375213273505527

Dec-100: 
0.0889878304922865999004910375213273881004
0.0889878304922865999004910375213273881004

Live demo: https://wandbox.org/permlink/tAm4sBIoIuUy2lO6

For double, the first calculation was more accurate, however, I guess one cannot make any generic conclusions here.

Also, note that your input numbers are not accurately representable with the IEEE 754 double precision floating-point type (none of them). The question is whether you care about the accuracy of calculations with either those exact numbers of their closest representations.

Numerical accuracy of pow(a/b,x) vs pow(b/a,-x)

4 Answers4