56

I'm battling some floating point problems in Pandas read_csv function. In my investigation, I found this:

In [15]: a = 5.9975

In [16]: a
Out[16]: 5.9975

In [17]: np.float64(a)
Out[17]: 5.9974999999999996

Why is builtin float of Python and the np.float64 type from Python giving different results? I thought they were both C++ doubles?

mchangun
  • 9,814
  • 18
  • 71
  • 101

1 Answers1

61
>>> numpy.float64(5.9975).hex()
'0x1.7fd70a3d70a3dp+2'
>>> (5.9975).hex()
'0x1.7fd70a3d70a3dp+2'

They are the same number. What differs is their representation; the Python native type uses a "sane" representation, and the NumPy type uses an accurate representation.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • 2
    By representation, you mean the way it is printed to screen? – mchangun Nov 24 '14 at 06:16
  • 2
    Via the `__repr__()` method or its C-level equivalent, yes. – Ignacio Vazquez-Abrams Nov 24 '14 at 06:18
  • 4
    A truly *accurate* representation would actually be 5.99749999999999960920149533194489777088165283203125, which is the exact decimal value of the 64-bit float you get when you evaluate the float literal `5.9975`. – Mark Amery Mar 17 '16 at 12:19
  • 3
    @MarkAmery The max precision a float 64 can reach is close to 10-16 (unit in the last place (ULP), see https://en.wikipedia.org/wiki/Floating-point_arithmetic) so the idea of an exact decimal value with significantly more than 16 digits for a floating point is misleading. – Jonathan Nappee Jul 03 '17 at 13:02
  • 7
    @JonathanNappee: Every numeric binary64 representation does in fact have an exact decimal equivalent. The trouble occurs when we believe that a much less precise decimal value is represented by a given binary64 value. – Ignacio Vazquez-Abrams Jul 03 '17 at 13:51