3

I have a dataset comprised of xdata and ydata that I want to fit a polynomial to, but for some reason, the fitting results depend on the dtype of the dataset, even though the actual values of the data remain unchanged. I understand that if you change the dtype e.g. from float to int, that there can be some loss of information, but in this case I am converting from 'f4' to 'f8', thus no information is lost, which is why I am at a loss. What is going on here?

import numpy as np
from numpy.polynomial import polynomial

x32 = np.array([
    1892.8972, 1893.1168, 1893.1626, 1893.4313, 1893.4929, 1895.6392,
    1895.7642, 1896.4286, 1896.5693, 1897.313,  1898.4648
], dtype='f4')

y32 = np.array([
    510.83655, 489.91592, 486.4508,  469.21814, 465.7902,  388.65576,
    385.37637, 369.07236, 365.8301,  349.7118,  327.4062
], dtype='f4')

x64 = x32.astype('f8')
y64 = y32.astype('f8')

a, residuals1, _, _, _ = np.polyfit(x32, y32, 2, full=True)
b, residuals2, _, _, _ = np.polyfit(x64, y64, 2, full=True)

c, (residuals3, _, _, _) = polynomial.polyfit(x32, y32, 2, full=True)
d, (residuals4, _, _, _) = polynomial.polyfit(x64, y64, 2, full=True)

print(residuals1, residuals2, residuals3, residuals4)  # [] [195.86309188] [] [195.86309157]
print(a)        # [ 3.54575804e+00 -1.34738721e+04  1.28004924e+07]
print(b)        # [-8.70836523e-03  7.50419309e-02  3.15525483e+04]
print(c[::-1])  # [ 3.54575804e+00 -1.34738721e+04  1.28004924e+07]
print(d[::-1])  # [-8.7083541e-03   7.5099051e-02   3.1552398e+04 ]

I also only noticed this issue because I'm also interested in the residuals values and they turned up to be empty, which caused my program to crash.

iacob
  • 20,084
  • 6
  • 92
  • 119
mapf
  • 1,906
  • 1
  • 14
  • 40

2 Answers2

3

This differing behaviour is due to rcond in polynomial, which is precision-dependent:

    rcond : float, optional
        Relative condition number of the fit. Singular values smaller than
        this relative to the largest singular value will be ignored. The
        default value is len(x)*eps, where eps is the relative precision of
        the float type, about 2e-16 in most cases.

...

    # set rcond
    if rcond is None:
        rcond = len(x)*finfo(x.dtype).eps

Setting rcond to an appropriately small value for the 32bit example will produce the same results as the 64bit one (e.g. rcond=1e-7 or smaller) .

Arty
  • 14,883
  • 6
  • 36
  • 69
iacob
  • 20,084
  • 6
  • 92
  • 119
  • Thanks! If only I studied the documentation more thoroughly, though still I would have probably never guessed this to be the reason. Good to know that this parameter can occasionally really be something to look out for. – mapf Apr 09 '21 at 08:47
1

The difference happens due to rcond hidden parameter of polyfit() being different for float32 and float64. This is relative error of approximation. For float32 its default is around 2e-7, for float64 its default is around 2e-16. If you specify same rcond param by yourself then you'll get same results.

Code below uses rcond param and also draws plots using np.polyval to show almost same visual results.

Try it online!

import numpy as np
from numpy.polynomial import polynomial
import matplotlib.pyplot as plt

x32 = np.array([
    1892.8972, 1893.1168, 1893.1626, 1893.4313, 1893.4929, 1895.6392,
    1895.7642, 1896.4286, 1896.5693, 1897.313,  1898.4648
], dtype = 'f4')

y32 = np.array([
    510.83655, 489.91592, 486.4508,  469.21814, 465.7902,  388.65576,
    385.37637, 369.07236, 365.8301,  349.7118,  327.4062
], dtype = 'f4')

x64 = x32.astype('f8')
y64 = y32.astype('f8')

rcond = 2e-7

a, residuals1, _, _, _ = np.polyfit(x32, y32, 2, full=True, rcond = rcond)
b, residuals2, _, _, _ = np.polyfit(x64, y64, 2, full=True, rcond = rcond)

c, (residuals3, _, _, _) = polynomial.polyfit(x32, y32, 2, full=True, rcond = rcond)
d, (residuals4, _, _, _) = polynomial.polyfit(x64, y64, 2, full=True, rcond = rcond)

print(residuals1, residuals2, residuals3, residuals4)  
# [] [195.86309188] [] [195.86309157]
print(a)  # [ 3.54575804e+00 -1.34738721e+04  1.28004924e+07]
print(b)  # [-8.70836523e-03  7.50419309e-02  3.15525483e+04]
print(c)  # [ 1.28004924e+07 -1.34738721e+04  3.54575804e+00]
print(d)  # [ 3.1552398e+04  7.5099051e-02 -8.7083541e-03]

plt.plot(x64, y64, label = 'orig')
plt.plot(x32, np.polyval(a, x32), label = 'x32_v0')
plt.plot(x64, np.polyval(b, x64), label = 'x64_v0')
plt.plot(x32, np.polyval(c[::-1], x32), label = 'x32_v1')
plt.plot(x64, np.polyval(d[::-1], x64), label = 'x64_v1')
plt.legend()
plt.show()

enter image description here

Arty
  • 14,883
  • 6
  • 36
  • 69
  • Thank you! I would have never guessed this to be the cause. Good to know the visuals line up. – mapf Apr 09 '21 at 08:58