0

As i wrote following example of polynom approximation, using MSE and backprop:

import numpy as np
import matplotlib.pyplot as plt

x = np.array([1,2,3,4,5])
y = np.array([5,15,25,45,65])

grade = 3

def interpolate_backprop(x,y,grade,alfa=1e-4, steps = 100000):
    inputs = [[xi**n for n in range(grade)] for xi in x]
    inputs = np.array(inputs, dtype = np.double)
    #alfa/= grade*grade
    weights = np.random.random(size = grade)
    #weights = np.zeros_like(inputs[0])
    #max_error = 0.1
    #min_error = - max_error
    for n in range(steps):
        i = n % len(x)
        input = inputs[i]
        output = np.sum(input*weights)
        error = output - y[i]
        #error = min(max(error,min_error), max_error)
        if n % 10000 == 0:
            print("error =" , error )
        weights-=alfa*error*input
    return weights

weights = interpolate_backprop(x,y,grade)

inputs = np.array([[xi**n for n in range(grade)] for xi in x])
for n, input in enumerate(inputs):
    print("for x = ", input[1] , " : ", np.sum(input*weights) , "expected:" , y[n])

xx = np.arange(x[0]-1,x[-1]+1, step = 0.1)
print(xx)

inputs = np.array([[xi**n for n in range(grade)] for xi in xx])

yy = np.array([np.sum(input*weights) for input in inputs])

plt.plot(x,y,xx,yy)

plt.show()

I discovered that this works awsome for small polynom grades, like 3-5, however the issue with exploding errors and float overflow is always at high grades, even if i try to make small alfa or try to make bounds for error value ( see max_error and min_error what is currently commented out). This appears, because in multiplying alfaerrorinput, what is a derivative of curent weight, input is x** , and big x causes the weights to explode.

What solutions are there to make the backprop better? One would be remove factor "input" out of this line like that : weights-=alfa*error#*input however you will not calculate the derivative properly, what would result in incorrect polynom - it will be less accurate for lower grades and still does not solve the weights exploding problem. For example for grade = 10 I must take incredibly low alfa 1e-16 and approx. is bay anyway. Any suggestion how to improve it ?

Smer5
  • 96
  • 9
  • 1
    There are many initial conditions which will cause this algorithm to fail, even if implemented perfectly: non-convex function, poor scaling, etc. Have you verified that your "high grade" polynomials are strictly convex? How about scale? Gradient descent is sensitive to floating point error. How does this algorithm perform against "low grade" (low degree) versions of [Rosenbrock's function](https://en.wikipedia.org/wiki/Rosenbrock_function), with appropriate initial condition? – Michael Ruth Jan 21 '23 at 01:17
  • Yes it can be non convex, however i do not think that makes the issue. Issue makes that high values of input makes the weights explode, or else if you correct, like divide on maximum value from input, it destroys or makes the convergence poor. I do not know about Rosenbrock function, need to check that. The alternative i know is Newton approx, what makes exact polynom of grade equal to amount of points. – Smer5 Jan 21 '23 at 02:14

0 Answers0