7

Is there a difference in accuracy between pow(a/b,x) and pow(b/a,-x)? If there is, does raising a number less than 1 to a positive power or a number greater than 1 to a negative power produce more accurate result?

Edit: Let's assume x86_64 processor and gcc compiler.

Edit: I tried comparing using some random numbers. For example:

printf("%.20f",pow(8.72138221/1.761329479,-1.51231)) // 0.08898783049228660424
printf("%.20f",pow(1.761329479/8.72138221, 1.51231)) // 0.08898783049228659037

So, it looks like there is a difference (albeit minuscule in this case), but maybe someone who knows about the algorithm implementation could comment on what the maximum difference is, and under what conditions.

SU3
  • 5,064
  • 3
  • 35
  • 66
  • 3
    Test them and see? It probably depends a lot on the implementation of `pow()` being used. – Shawn Apr 09 '19 at 06:31
  • 1
    You'd have to provide your architecture, at least, for anyone to be able to answer that. – xaxxon Apr 09 '19 at 06:31
  • My guess is that it doesn't matter much, the representation has about the same accuracy in both directions. But the standard doesn't give you hard guarantees for `pow`, as it is not one of the standard operations that must be accurate. – starblue Apr 09 '19 at 06:52
  • Please don't use `n` for non-integers, that is misleading. Often different algorithms are used for integer and non-integer exponents. – starblue Apr 09 '19 at 06:54
  • 2
    The difference might not be entirely due to `pow` because divisions are done beforehand (perhaps by the compiler), which is the first opportunity for accuracy. – Weather Vane Apr 09 '19 at 07:10
  • 1
    Note that your input numbers are not representable with IEEE 754 double-precision floating point type. Do you care about the accuracy with respect to the exact calculation? https://www.wolframalpha.com/input/?i=(1.761329479+%2F+8.72138221)%5E1.51231 – Daniel Langr Apr 09 '19 at 13:58
  • @starblue It doesn't have about the same accuracy in both directions (for `double`), check my answer. For one direction, the error was about twice as big, which might matter in some applications. However, agree that this is implementation-dependent. – Daniel Langr Apr 09 '19 at 14:07
  • I'm surprised that no one is including `pow(a, n) * pow(b, -n)` in the comparison. – Ben Voigt Apr 13 '19 at 01:42

4 Answers4

2

Here's one way to answer such questions, to see how floating-point behaves. This is not a 100% correct way to analyze such question, but it gives a general idea.

Let's generate random numbers. Calculate v0=pow(a/b, n) and v1=pow(b/a, -n) in float precision. And calculate ref=pow(a/b, n) in double precision, and round it to float. We use ref as a reference value (we suppose that double has much more precision than float, so we can trust that ref can be considered the best possible value. This is true for IEEE-754 for most of the time). Then sum the difference between v0-ref and v1-ref. The difference should calculated with "the number of floating point numbers between v and ref".

Note, that the results may be depend on the range of a, b and n (and on the random generator quality. If it's really bad, it may give a biased result). Here, I've used a=[0..1], b=[0..1] and n=[-2..2]. Furthermore, this answer supposes that the algorithm of float/double division/pow is the same kind, have the same characteristics.

For my computer, the summed differences are: 2604828 2603684, it means that there is no significant precision difference between the two.

Here's the code (note, this code supposes IEEE-754 arithmetic):

#include <cmath>
#include <stdio.h>
#include <string.h>

long long int diff(float a, float b) {
    unsigned int ai, bi;
    memcpy(&ai, &a, 4);
    memcpy(&bi, &b, 4);
    long long int diff = (long long int)ai - bi;
    if (diff<0) diff = -diff;
    return diff;
}

int main() {
    long long int e0 = 0;
    long long int e1 = 0;
    for (int i=0; i<10000000; i++) {
        float a = 1.0f*rand()/RAND_MAX;
        float b = 1.0f*rand()/RAND_MAX;
        float n = 4.0f*rand()/RAND_MAX - 2.0f;

        if (a==0||b==0) continue;

        float v0 = std::pow(a/b, n);
        float v1 = std::pow(b/a, -n);
        float ref = std::pow((double)a/b, n);

        e0 += diff(ref, v0);
        e1 += diff(ref, v1);
    }

    printf("%lld %lld\n", e0, e1);
}
geza
  • 28,403
  • 6
  • 61
  • 135
2

In general, the form with the positive power is slightly better, although by so little it will likely have no practical effect. Specific cases could be distinguished. For example, if either a or b is a power of two, it ought to be used as the denominator, as the division then has no rounding error.

In this answer, I assume IEEE-754 binary floating-point with round-to-nearest-ties-to-even and that the values involved are in the normal range of the floating-point format.

Given a, b, and x with values a, b, and x, and an implementation of pow that computes the representable value nearest the ideal mathematical value (actual implementations are generally not this good), pow(a/b, x) computes (a/b•(1+e0))x•(1+e1), where e0 is the rounding error that occurs in the division and e1 is the rounding error that occurs in the pow, and pow(b/a, -x) computes (b/a•(1+e2))x•(1+e3), where e2 and e3 are the rounding errors in this division and this pow, respectively.

Each of the errors, e0e3 lies in the interval [−u/2, u/2], where u is the unit of least precision (ULP) of 1 in the floating-point format. (The notation [p, q] is the interval containing all values from p to q, including p and q.) In case a result is near the edge of a binade (where the floating-point exponent changes and the significand is near 1), the lower bound may be −u/4. At this time, I will not analyze this case.

Rewriting, these are (a/b)x•(1+e0)x•(1+e1) and (a/b)x•(1+e2)x•(1+e3). This reveals the primary difference is in (1+e0)x versus (1+e2)x. The 1+e1 versus 1+e3 is also a difference, but this is just the final rounding. [I may consider further analysis of this later but omit it for now.]

Consider (1+e0)x and (1+e2)x.The potential values of the first expression span [(1−u/2)x, (1+u/2)x], while the second spans [(1+u/2)x, (1−u/2)x]. When x > 0, the second interval is longer than the first:

  • The length of the first is (1+u/2)x−(1+u/2)x.
  • The length of the second is (1/(1−u/2))x−(1/(1+u/2))x.
  • Multiplying the latter by (1−u2/22)x produces ((1−u2/22)/(1−u/2))x−( (1−u2/22)/(1+u/2))x = (1+u/2)x−(1+u/2)x, which is the length of the first interval.
  • 1−u2/22 < 1, so (1−u2/22)x < 1 for positive x.
  • Since the first length equals the second length times a number less than one, the first interval is shorter.

Thus, the form in which the exponent is positive is better in the sense that it has a shorter interval of potential results.

Nonetheless, this difference is very slight. I would not be surprised if it were unobservable in practice. Also, one might be concerned with the probability distribution of errors rather than the range of potential errors. I suspect this would also favor positive exponents.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
2

... between pow(a/b,x) and pow(b/a,-x) ... does raising a number less than 1 to a positive power or a number greater than 1 to a negative power produce more accurate result?

Whichever division is more arcuate.


Consider z = xy = 2y * log2(x).

Roughly: The error in y * log2(x) is magnified by the value of z to form the error in z. xy is very sensitive to the error in x. The larger the |log2(x)|, the greater concern.

In OP's case, both pow(a/b,p) and pow(b/a,-p), in general, have the same y * log2(x) and same z and similar errors in z. It is a question of how x, y are formed:


a/b and b/a, in general, both have the same error of +/- 0.5*unit in the last place and so both approaches are of similar error.

Yet with select values of a/b vs. b/a, one quotient will be more exact and it is that approach with the lower pow() error.

pow(7777777/4,-p) can be expected to be more accurate than pow(4/7777777,p).

Lacking assurance about the error in the division, the general rule applies: no major difference.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Barring a case like one of the numbers is exactly a power of two, how would you know which division is more accurate? What code would you write for this? – Eric Postpischil Apr 10 '19 at 01:09
  • 1
    @EricPostpischil Perhaps some comparison involving `fmod(a,b)` and `fmod(b,a)`. – chux - Reinstate Monica Apr 10 '19 at 01:40
  • That’s a start. Except it gives you information on the absolute fraction part of the quotient. We would need to know the fraction that occurs at the point of rounding. Which seems unrelated. – Eric Postpischil Apr 10 '19 at 01:51
  • @EricPostpischil Hmmm, I think it might be as simple as the lesser of the `fabs(log2(fmod(a,b))), fabs(log2(fmod(b,a)))`. I'll dig deeper later: tax season right now ;-( – chux - Reinstate Monica Apr 10 '19 at 03:25
0

For evaluation of rounding errors like in your case, it might be useful to use some multi-precision library, such as Boost.Multiprecision. Then, you can compare results for various precisions, e.g, such as with the following program:

#include <iomanip>
#include <iostream>
#include <boost/multiprecision/cpp_bin_float.hpp>
#include <boost/multiprecision/cpp_dec_float.hpp>

namespace mp = boost::multiprecision;

template <typename FLOAT>
void comp() {
  FLOAT a = 8.72138221;
  FLOAT b = 1.761329479;
  FLOAT c = 1.51231;

  FLOAT e = mp::pow(a / b, -c);
  FLOAT f = mp::pow(b / a, c);

  std::cout << std::fixed << std::setw(40) << std::setprecision(40) << e << std::endl;
  std::cout << std::fixed << std::setw(40) << std::setprecision(40) << f << std::endl;
}

int main() {
  std::cout << "Double: " << std::endl;
  comp<mp::cpp_bin_float_double>();
  td::cout << std::endl;

  std::cout << "Double extended: " << std::endl;
  comp<mp::cpp_bin_float_double_extended>();
  std::cout << std::endl;

  std::cout << "Quad: " << std::endl;
  comp<mp::cpp_bin_float_quad>();
  std::cout << std::endl;

  std::cout << "Dec-100: " << std::endl;
  comp<mp::cpp_dec_float_100>();
  std::cout << std::endl;
}

Its output reads, on my platform:

Double: 
0.0889878304922865903670015086390776559711
0.0889878304922866181225771242679911665618

Double extended: 
0.0889878304922865999079806265115166752366
0.0889878304922865999012043629334822725241

Quad: 
0.0889878304922865999004910375213273866639
0.0889878304922865999004910375213273505527

Dec-100: 
0.0889878304922865999004910375213273881004
0.0889878304922865999004910375213273881004

Live demo: https://wandbox.org/permlink/tAm4sBIoIuUy2lO6

For double, the first calculation was more accurate, however, I guess one cannot make any generic conclusions here.


Also, note that your input numbers are not accurately representable with the IEEE 754 double precision floating-point type (none of them). The question is whether you care about the accuracy of calculations with either those exact numbers of their closest representations.

Daniel Langr
  • 22,196
  • 3
  • 50
  • 93