The most accurate way to calculate numerator and denominator of a double

Question

I have implemented class NaturalNum for representing a natural number of "infinite" size (up to 4GB).

I have also implemented class RationalNum for representing a rational number with infinite accuracy. It stores the numerator and the denominator of the rational number, both of which are NaturalNum instances, and relies on them when performing any arithmetic operation issued by the user.

The only place where precision is "dropped by a certain degree", is upon printing, since there's a limit (provided by the user) to the number of digits that appear after the decimal (or non-decimal) point.

My question concerns one of the constructors of class RationalNum. Namely, the constructor that takes a double value, and computes the corresponding numerator and denominator.

My code is given below, and I would like to know if anyone sees a more accurate way for computing them:

RationalNum::RationalNum(double value)
{
    if (value == value+1)
        throw "Infinite Value";
    if (value != value)
        throw "Undefined Value";

    m_sign        = false;
    m_numerator   = 0;
    m_denominator = 1;

    if (value < 0)
    {
        m_sign = true;
        value  = -value;
    }

    // Here is the actual computation
    while (value > 0)
    {
        unsigned int floor = (unsigned int)value;
        value         -= floor;
        m_numerator   += floor;
        value         *= 2;
        m_numerator   *= 2;
        m_denominator *= 2;
    }

    NaturalNum gcd = GCD(m_numerator,m_denominator);
    m_numerator   /= gcd;
    m_denominator /= gcd;
}

Note: variables starting with 'm_' are member variables.

Thanks

Decompose the float into sign, mantissa and exponent, and then you have enough information to represent it exactly. — Eric Lippert, Jan 18 '14 at 23:41
Wouldn't I be risking the possibility of different FP representations on different compilers (or CPU architectures)? — barak manos, Jan 18 '14 at 23:43
@barakmanos Are you programming for an IBM mainframe or a Cray supercomputer? If you are not, then forget about non-IEEE 754 representations. — Pascal Cuoq, Jan 19 '14 at 16:22
@Pascal Cuoq: Thanks. As you can see, I am not relying on any FP representation whatsoever. — barak manos, Jan 19 '14 at 16:26
@Patricia Shanahan: NaturalNum (unlimited-size unsigned integers), as mentioned at the beginning of the question. — barak manos, Jan 19 '14 at 16:30
Just updated my answer, by the way. You might appreciate it. — vmrob, Jan 21 '14 at 00:04
@vmrob, Thank you, but I'm wondering if this code achieves a higher accuracy when calculating the numerator and denominator of a given `double` value. Even **one specific value**, if you can provide an example. Otherwise, I might as well use the code I have provided within my question, which is far "cleaner" and easier to understand. Thanks again :) — barak manos, Jan 21 '14 at 15:07

Jan Hudec · Answer 1 · 2014-01-19T17:49:45.293

3

The standard library contains a function for obtaining the significand and exponent, frexp.

Just multiply the significand to get all bits before decimal point and set appropriate denominator. Just don't forget the significand is normalized to be between 0.5 and 1 (I would consider between 1 and 2 more natural but whatever) and that it has 53 significant bits for IEEE double (there are no practically used platforms that would use different floating point format).

edited Jan 19 '14 at 17:49

answered Jan 18 '14 at 23:51

Jan Hudec

73,652
13
125
172

Thanks; Asking the same question as on one of the comments above: wouldn't I be risking the possibility of different FP representations on different compilers (or CPU architectures)? Or is `frexp` guaranteed to be implemented accordingly in any case? – barak manos Jan 18 '14 at 23:54
The significand (not mantissa; significands are linear, but mantissas are logarithmic) has 53 significant bits, not 52. 52 is just the number of bits that are explicitly encoded. – Eric Postpischil Jan 19 '14 at 00:30
@EricPostpischil: Thanks, updated (for some reason I thought the exponent is 12, not 11 bits). – Jan Hudec Jan 19 '14 at 17:50
@EricPostpischil: I've changed the terminology, but note, that [wikipedia](http://en.wikipedia.org/wiki/Significand) has "mantissa" mentioned as synonym of significand. – Jan Hudec Jan 19 '14 at 17:52
1

@JanHudec: And that Wikipedia page says “mantissa” is discouraged by the IEEE floating-point committee and others. The IEEE 754 standard does not use “mantissa” at all. Some of us remember when mantissas had to be looked up in tables of logarithms. The difference between a mantissa and a significand will be painfully evident when you do floating-point math by hand on paper. – Eric Postpischil Jan 19 '14 at 17:55

vmrob · Answer 2 · 2014-01-21T20:41:34.507

I'm not 100% confident in the math that you have for the actual computation only because I haven't really examined it, but I think the below method removes the need to use the GCD function which could bring in some unnecessary running time.

Here is the class I came up with. I haven't fully tested it, but I produced a couple billion random doubles and the asserts never fired, so I'm reasonably confident in its usability, but I would still test the edge cases around INT64_MAX a little more.

If I'm not mistaken, the running time complexity of this algorithm is linear with respect to the size in bits of the input.

#include <iostream>
#include <cmath>
#include <cassert>
#include <limits>

class Real;

namespace std {
    inline bool isnan(const Real& r);
    inline bool isinf(const Real& r);
}

class Real {
public:
    Real(double val)
        : _val(val)
    {
        if (std::isnan(val)) { return; }
        if (std::isinf(val)) { return; }

        double d;

        if (modf(val, &d) == 0) {
            // already a whole number
            _num = val;
            _den = 1.0;
            return;
        }

        int exponent;
        double significand = frexp(val, &exponent); // val = significand * 2^exponent
        double numerator = val;
        double denominator = 1;

        // 0.5 <= significand < 1.0
        // significand is a fraction, multiply it by two until it's a whole number
        // subtract exponent appropriately to maintain val = significand * 2^exponent
        do {
            significand *= 2;
            --exponent;
            assert(std::ldexp(significand, exponent) == val);
        } while (modf(significand, &d) != 0);

        assert(exponent <= 0);  

        // significand is now a whole number
        _num = significand;
        _den = 1.0 / std::ldexp(1.0, exponent);

        assert(_val == _num / _den);
    }

    friend std::ostream& operator<<(std::ostream &os, const Real& rhs);
    friend bool std::isnan(const Real& r);
    friend bool std::isinf(const Real& r);

private:
    double _val = 0;
    double _num = 0;
    double _den = 0;
};

std::ostream& operator<<(std::ostream &os, const Real& rhs) {
    if (std::isnan(rhs) || std::isinf(rhs)) {
        return os << rhs._val;
    }
    if (rhs._den == 1.0) {
        return os << rhs._num;
    }
    return os << rhs._num << " / " << rhs._den;
}

namespace std {
    inline bool isnan(const Real& r) { return std::isnan(r._val); }
    inline bool isinf(const Real& r) { return std::isinf(r._val); }
}

#include <iomanip>

int main () {

    #define PRINT_REAL(num) \
        std::cout << std::setprecision(100) << #num << " = " << num << " = " << Real(num) << std::endl

    PRINT_REAL(1.5);
    PRINT_REAL(123.875);
    PRINT_REAL(0.125);

    // double precision issues
    PRINT_REAL(-10000000000000023.219238745);
    PRINT_REAL(-100000000000000000000000000000000000000000.5);

    return 0;
}

Upon looking at your code a little bit more, there's at least a problem with your testing for infinite values. Note the following program:

#include <numeric>
#include <cassert>
#include <cmath>

int main() {
    {
        double d = std::numeric_limits<double>::max(); // about 1.7976931348623e+308

        assert(!std::isnan(d));
        assert(!std::isinf(d));

        // assert(d != d + 1); // fires
    }

    {
        double d = std::ldexp(1.0, 500); // 2 ^ 700
        assert(!std::isnan(d));
        assert(!std::isinf(d));

        // assert(d != d + 1); // fires
    }
}

In addition to that, if your GCD function doesn't support doubles, then you'll be limiting yourself in terms of values you can import as doubles. Try any number > INT64_MAX and the GCD function may not work.

So? Now I'm stuck with yet another `double` value... or will this value always be 0.5? — barak manos, Jan 18 '14 at 23:57
@barakmanos: The significand is returned as a `double` but is easily converted to other forms since its scale is known. If you have 64-bit integers available, simply multiply by 2**53 and convert to an integer. — Eric Postpischil, Jan 19 '14 at 00:31
Turns out this is a lot more complex than I thought, I started implementing my own solution, but it's ugly and I don't like it. If I ever had to do this, I would use the Boost rational number library: http://www.boost.org/doc/libs/1_55_0/libs/rational/rational.html — vmrob, Jan 19 '14 at 01:19
@vmrob,@Eric Postpischil: Please keep in mind that my primary goal here is to compute the numerator and the denomintor of a `double` value. I still don't see how the solution above helps me to achieve this at all (let alone, in a more accurate way than the one implemented in my question). — barak manos, Jan 19 '14 at 04:54
@barakmanos I see that now. My own attempts at the problem were lackluster at best though I think I'll give it another try soon. At the very least, it's an interesting exercise for myself. — vmrob, Jan 19 '14 at 08:05
@barakmanos The problem I see with the entire question is a matter of exactness. For example, the number 0.1 can't be represented in binary exactly. As a result, the implementation I came up finds the most exact answer to be `3602879701896397 / 36028797018963968` which is as close of a representation a 64 bit double will allow. What I'm working on right now is an exactness scale so that it can be simplified at the cost of accuracy. — vmrob, Jan 19 '14 at 10:04
@vmrob: Thank you very much for your efforts. My implementation (described within the question) also yields the exact same result, and it does not rely on the FP representation, but simply on "shifting the double" until it becomes zero. So perhaps there is no way to achieve a more accurate result for this specific value (although I cannot prove this mathematically). — barak manos, Jan 19 '14 at 11:42
@barakmanos Your method and the method described in this answer are both exact for IEEE 754 doubles (your method only when it works at all: it invokes undefined behavior when `(unsigned int)value` overflows). The double-precision number obtained by writing `0.1` in source code **is** 3602879701896397 times some power of two. No “proof” is involved. — Pascal Cuoq, Jan 19 '14 at 16:18
@Pascal Cuoq: Thanks. I'm checking in the code for `inf` and `NaN`. Is it not enough to assert that no such overflow will take place? — barak manos, Jan 19 '14 at 16:25
@barakmanos It is not enough to check for infinities. The double `1.0e20` is enough to overflow a 64-bit unsigned integer. — Pascal Cuoq, Jan 19 '14 at 16:28
...and I'm using a 32-bit unsigned integer in my code; thanks for the info. — barak manos, Jan 19 '14 at 16:31
@vmrob, Thank you, but I'm wondering if this code achieves a higher accuracy when calculating the numerator and denominator of a given `double` value. Even **one specific value**, if you can provide an example. Otherwise, I might as well use the code I have provided within my question, which is far "cleaner" and easier to understand. Thanks again :) — barak manos, Jan 21 '14 at 15:10

The most accurate way to calculate numerator and denominator of a double

2 Answers2