0

I have a signal that I wish to fit using polynomial fitting first and then use that polynomial expression for extrapolation. Meaning, originally, my x axis range is 192.1-196THz. And I have corresponding y axis values for this range. But I want to plot the signal from 191.325-196.125 THz, meaning I need to predict y axis values outside of my original x axis range. Here is my code

import numpy as np
import matplotlib.pyplot as plt
import numpy.polynomial.polynomial as poly




y=[9.996651009,10.13327408,10.27003353,10.43754263,10.60523071,10.71136134,10.81753489,10.92299416,11.02853271,11.10093835,11.17349619,11.16565259,11.15787539,11.16191795,11.16626826,11.16087842,11.15561384,11.16940196,11.18346164,11.22434472,11.26537218,11.35981126,11.45427174,11.6102862,11.76633923,11.89058563,12.01486743,12.21133227,12.40782023,12.55682327,12.7059072,12.88910065,13.07234341,13.20139753,13.33048759,13.46960371,13.60875466,13.71956075,13.83042354,13.91157425,13.99277241,14.08548465,14.17824161,14.26091536,14.34364144,14.43504833,14.52651725,14.59548578,14.66449439,14.73751377,14.81058218,14.88687742,14.96322511,15.038278,15.11342222,15.1644832,15.21556914,15.31798775,15.42044285,15.51000031,15.59959896,15.68089263,15.76229354,15.86213633,15.96214484,16.0269976,16.0923806,16.16483146,16.23762336,16.24233076,16.24719576,16.26116812,16.27543692,16.28311024,16.29128992,16.23458771,16.17830522,16.12636211,16.07478778]

y.reverse()


# x axis for polyfit
i = 192.1e12
x=[]
while i <= 196e12:
  x.append(i)
  i += 50e9

plt.plot(x, y)


# x axis for extrapolation
i1= 191.325e12
x2=[]
while i1 <= 196.125e12:

  x2.append(i1)
  i1 += 50e9


z = poly.polyfit(x, y, 20)
f = poly.polyval(x, z)

plt.plot(x2, f(x2), linestyle='--')

Problems:

1. The error I am getting : 'numpy.ndarray' object is not callable. (line 37) I cannot find a work around.

2. If I put more than 21 as the number of coefficients then I run into the following problem: SVD did not converge in Linear Least Squares. (line 34) I basically have no idea about this problem.

Can anyone give me some ideas? Thank you.

MRR
  • 45
  • 1
  • 6
  • `If I put more than 21 as the number of coefficients then I run into the following problem: SVD did not converge` Is there a good reason to expect a really high degree polynomial to provide a good fit? I would bet that the polynomial goes crazy as soon as you move slightly outside of the area where you have known values. – Nick ODell Jun 13 '23 at 21:15
  • @NickODell it is true that the polynomial will go crazy. however i do want to know about this error message i am getting in general and also i would like to do a comparison of accuracy for different orders of polynomial fitting. – MRR Jun 15 '23 at 15:39

2 Answers2

2

Extrapolation is inaccurate, but you can do it using scipy.interpolate.interp1d by telling it not to give bounds errors (bounds_error=False) and to fill the values via extrapolation (fill_value="extrapolation"). In my code, I used a cubic spline fit. Also, make use of numpy functions to generate the x arrays.

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d

plt.close("all")


y = [9.996651009, 10.13327408, 10.27003353, 10.43754263, 10.60523071, 
     10.71136134, 10.81753489, 10.92299416, 11.02853271, 11.10093835, 
     11.17349619, 11.16565259, 11.15787539, 11.16191795, 11.16626826, 
     11.16087842, 11.15561384, 11.16940196, 11.18346164, 11.22434472, 
     11.26537218, 11.35981126, 11.45427174, 11.6102862, 11.76633923, 
     11.89058563, 12.01486743, 12.21133227, 12.40782023, 12.55682327, 
     12.7059072, 12.88910065, 13.07234341, 13.20139753, 13.33048759, 
     13.46960371, 13.60875466, 13.71956075, 13.83042354, 13.91157425, 
     13.99277241, 14.08548465, 14.17824161, 14.26091536, 14.34364144, 
     14.43504833, 14.52651725, 14.59548578, 14.66449439, 14.73751377, 
     14.81058218, 14.88687742, 14.96322511, 15.038278, 15.11342222, 
     15.1644832, 15.21556914, 15.31798775, 15.42044285, 15.51000031, 
     15.59959896, 15.68089263, 15.76229354, 15.86213633, 15.96214484, 
     16.0269976, 16.0923806, 16.16483146, 16.23762336, 16.24233076, 
     16.24719576, 16.26116812, 16.27543692, 16.28311024, 16.29128992, 
     16.23458771, 16.17830522, 16.12636211, 16.07478778]
y = np.array(y)[::-1]

step = 50e9
x = np.arange(192.1e12, 196e12+step, step)
x2 = np.arange(191.325e12, 196.125e12+step, step)


f = interp1d(x, y, kind="cubic", bounds_error=False, fill_value="extrapolate")

fig, ax = plt.subplots()
ax.plot(x, y, ".", label="data")
ax.plot(x2, f(x2), "--", label="fit")
ax.legend()
ax.set_xlabel("x")
ax.set_ylabel("y")
fig.show()

enter image description here


Edit: If you do need a polynomial fit, then make use of np.poly1d. This returns a function that can be evaluated, unlike polyval which only returns the values (hence the error you were getting).

import numpy as np
import matplotlib.pyplot as plt

plt.close("all")

y = [9.996651009, 10.13327408, 10.27003353, 10.43754263, 10.60523071, 
     10.71136134, 10.81753489, 10.92299416, 11.02853271, 11.10093835, 
     11.17349619, 11.16565259, 11.15787539, 11.16191795, 11.16626826, 
     11.16087842, 11.15561384, 11.16940196, 11.18346164, 11.22434472, 
     11.26537218, 11.35981126, 11.45427174, 11.6102862, 11.76633923, 
     11.89058563, 12.01486743, 12.21133227, 12.40782023, 12.55682327, 
     12.7059072, 12.88910065, 13.07234341, 13.20139753, 13.33048759, 
     13.46960371, 13.60875466, 13.71956075, 13.83042354, 13.91157425, 
     13.99277241, 14.08548465, 14.17824161, 14.26091536, 14.34364144, 
     14.43504833, 14.52651725, 14.59548578, 14.66449439, 14.73751377, 
     14.81058218, 14.88687742, 14.96322511, 15.038278, 15.11342222, 
     15.1644832, 15.21556914, 15.31798775, 15.42044285, 15.51000031, 
     15.59959896, 15.68089263, 15.76229354, 15.86213633, 15.96214484, 
     16.0269976, 16.0923806, 16.16483146, 16.23762336, 16.24233076, 
     16.24719576, 16.26116812, 16.27543692, 16.28311024, 16.29128992, 
     16.23458771, 16.17830522, 16.12636211, 16.07478778]
y = np.array(y)[::-1]

step = 50e9
x = np.arange(192.1e12, 196e12+step, step)
x2 = np.arange(191.325e12, 196.125e12+step, step)

z = np.polyfit(x, y, 20)
p = np.poly1d(z)

fig, ax = plt.subplots()
ax.plot(x, y, ".", label="data")
ax.plot(x2, p(x2), "--", label="fit")
ax.legend()
ax.set_xlabel("x")
ax.set_ylabel("y")
fig.show()

enter image description here


Edit: Using the more updated np.polynomial.Polynomial methods, we get a similar result using a 5th-order polynomial fit.

import numpy as np
import matplotlib.pyplot as plt
from numpy.polynomial import Polynomial

plt.close("all")

y = [9.996651009, 10.13327408, 10.27003353, 10.43754263, 10.60523071, 
     10.71136134, 10.81753489, 10.92299416, 11.02853271, 11.10093835, 
     11.17349619, 11.16565259, 11.15787539, 11.16191795, 11.16626826, 
     11.16087842, 11.15561384, 11.16940196, 11.18346164, 11.22434472, 
     11.26537218, 11.35981126, 11.45427174, 11.6102862, 11.76633923, 
     11.89058563, 12.01486743, 12.21133227, 12.40782023, 12.55682327, 
     12.7059072, 12.88910065, 13.07234341, 13.20139753, 13.33048759, 
     13.46960371, 13.60875466, 13.71956075, 13.83042354, 13.91157425, 
     13.99277241, 14.08548465, 14.17824161, 14.26091536, 14.34364144, 
     14.43504833, 14.52651725, 14.59548578, 14.66449439, 14.73751377, 
     14.81058218, 14.88687742, 14.96322511, 15.038278, 15.11342222, 
     15.1644832, 15.21556914, 15.31798775, 15.42044285, 15.51000031, 
     15.59959896, 15.68089263, 15.76229354, 15.86213633, 15.96214484, 
     16.0269976, 16.0923806, 16.16483146, 16.23762336, 16.24233076, 
     16.24719576, 16.26116812, 16.27543692, 16.28311024, 16.29128992, 
     16.23458771, 16.17830522, 16.12636211, 16.07478778]
y = np.array(y)[::-1]

step = 50e9
x = np.arange(192.1e12, 196e12+step, step)
x2 = np.arange(191.325e12, 196.125e12+step, step)

p = Polynomial.fit(x, y, 5)

fig, ax = plt.subplots()
ax.plot(x, y, ".", label="data")
ax.plot(x2, p(x2), "--", label="fit")
ax.legend()
ax.set_xlabel("x")
ax.set_ylabel("y")
fig.show()

enter image description here

jared
  • 4,165
  • 1
  • 8
  • 31
  • thank you the detailed explanation. However, the documentation (https://numpy.org/doc/stable/reference/routines.polynomials.html) states clearly to avoid np.polyfit, np.polyval, and np.poly1d, and instead to use only the new(er) package. That is why I tried to avoid np.poly1d. – MRR Jun 13 '23 at 20:47
  • I've updated my answer to include code using `np.polynomial.Polynomial`. – jared Jun 13 '23 at 21:01
  • @MRR does this now answer your question? – jared Jun 14 '23 at 17:42
  • thank you for your revised answer. it answers my first problem. however i am waiting for the 2nd answer. – MRR Jun 15 '23 at 15:38
  • As per the [documentation](https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html), "Note that fitting polynomial coefficients is inherently badly conditioned when the degree of the polynomial is large." It recommends using `x - x.mean()` to improve the fit, and from my tests the fit is, in fact, improved, but the extrapolation is worse. – jared Jun 15 '23 at 17:22
0

it is true that the polynomial will go crazy. however i do want to know about this error message i am getting in general and also i would like to do a comparison of accuracy for different orders of polynomial fitting

It's useful to take a moment and think about what intermediate values the polynomial fitting code will have while it is fitting your polynomial. Your X values are on the order of 1.9 * 10^14. If you then take the 22nd power of that, you have the number 2.8 * 10^314. This is bigger than the maximum representable double-precision float.

Interestingly, even when your X^20 term fits into the range of a float, it seems that it still is suffering from floating-point precision problems. Here are the coefficients it fits for a degree 20 polynomial:

>>> print(z)
[ 7.01389718e+008 -7.23272471e-006 -6.18262251e-021  1.28113394e-034
  6.59034262e-049 -5.97742101e-066 -1.75197282e-077 -9.00617222e-092
  1.16972981e-106  3.58691274e-120 -9.24694178e-135  0.00000000e+000
  0.00000000e+000  0.00000000e+000  0.00000000e+000  0.00000000e+000
  0.00000000e+000  0.00000000e+000  0.00000000e+000  0.00000000e+000
  0.00000000e+000]

In other words, the coefficients it fits for everything beyond X^10 are zero. This is happening due to floating point precision problems.

How can this be fixed? You could take inspiration from machine-learning practitioners, and standardize your inputs. "Standardization" refers to subtracting the mean and dividing by the standard deviation of the input. Since this is a linear transformation, it is mathematically equivalent to fitting without standardization, while avoiding issues of overflow. When I do this, the coefficient it chooses for X^22 is -0.11, which is the equivalent of -5.3 * 10^-316 without standardization. Without standardization, this number cannot be expressed as a float.

Here is the code I used:

import numpy as np
import matplotlib.pyplot as plt
import numpy.polynomial.polynomial as poly
from sklearn.preprocessing import StandardScaler




y=[9.996651009,10.13327408,10.27003353,10.43754263,10.60523071,10.71136134,10.81753489,10.92299416,11.02853271,11.10093835,11.17349619,11.16565259,11.15787539,11.16191795,11.16626826,11.16087842,11.15561384,11.16940196,11.18346164,11.22434472,11.26537218,11.35981126,11.45427174,11.6102862,11.76633923,11.89058563,12.01486743,12.21133227,12.40782023,12.55682327,12.7059072,12.88910065,13.07234341,13.20139753,13.33048759,13.46960371,13.60875466,13.71956075,13.83042354,13.91157425,13.99277241,14.08548465,14.17824161,14.26091536,14.34364144,14.43504833,14.52651725,14.59548578,14.66449439,14.73751377,14.81058218,14.88687742,14.96322511,15.038278,15.11342222,15.1644832,15.21556914,15.31798775,15.42044285,15.51000031,15.59959896,15.68089263,15.76229354,15.86213633,15.96214484,16.0269976,16.0923806,16.16483146,16.23762336,16.24233076,16.24719576,16.26116812,16.27543692,16.28311024,16.29128992,16.23458771,16.17830522,16.12636211,16.07478778]

y.reverse()


# x axis for polyfit
i = 192.1e12
x=[]
while i <= 196e12:
    x.append(i)
    i += 50e9

plt.plot(x, y)


# x axis for extrapolation
i1= 191.325e12
x2=[]
while i1 <= 196.125e12:

    x2.append(i1)
    i1 += 50e9

x = np.array(x)
x2 = np.array(x2)

scaler = StandardScaler()
scaler.fit(x.reshape(-1, 1))
x_rescaled = scaler.transform(x.reshape(-1, 1))
x2_rescaled = scaler.transform(x2.reshape(-1, 1))

z = poly.polyfit(x_rescaled.reshape(-1), y, 20)
f = poly.polyval(x2_rescaled.reshape(-1), z)

plt.plot(x2, f, linestyle='--')
plt.ylim([7, 20])

20th degree poly

I should mention that doing an extrapolation on a 20th degree polynomial is likely to be useless outside the region you are fitting. A polynomial of that degree is extremely sensitive to small changes.

For example, I added random noise of magnitude 0.01, then re-fit the polynomial. I repeated this process nine times. Visually, you can barely see the noise, but the noise has an enormous effect on the polynomial fit. Half of the resulting polynomials point up, and half of them point down.

unreliable polynomials

See also Why is the use of high order polynomials for regression discouraged?

Nick ODell
  • 15,465
  • 3
  • 32
  • 66
  • thank you for this informative insight. but i am a bit curious as to why the extrapolated part (outside of my fitting region) goes up sometimes and down the other times- why am i not getting a consistent pattern? – MRR Jun 16 '23 at 20:35
  • @MRR The random noise I'm adding ends up having a non-zero correlation with the high-order elements of the polynomial. It's a small correlation, sometimes negative, sometimes positive, but it's not zero. Therefore, the polynomial fit can be made very slightly better by changing the high-order polynomial coefficients. Changing the high-order polynomial coefficient has a small effect in the middle of the data, but a massive effect outside the fitting region. – Nick ODell Jun 16 '23 at 21:12