0

I have a basic question. I want to use scikit-learn to fit a polynomial model to my data. I could do that by PolynomialFeatures but I want to fit a polynomial with some specific form.

For example, if I have 2 features I want to create a model such that:

F = a1 * x1 + a2 * x2 + a3 * x1 * x2 + a4 * x1^2 + a5 * x2^3

Can You please guide me how can I do that? I could not find any example that I can use for my purpose.

Sam Mason
  • 15,216
  • 1
  • 41
  • 60
Mojmal
  • 11

1 Answers1

0

I've used the following method to try and fit curves that map to specific functions, you might be able to rework some of this to meet your needs:

First, define your model function as a function that takes a value of x, and some set of parameters and which return an associated y value.

You'll need to be sure that your function really is a function in the mathematical sense (i.e. that they return a single value of y for any input value of x)

This is your "model" function - for example :

# A kind of elliptic curve with two parameters n, m
def mn_elliptical(x,m,n):
    return 1-((1 - (x)**(n))**(m))

For 2-dimensional models (i.e. where inputs are x and y, and there's a third output of z) there are ways of formulating your model and input data discussed here: https://scipython.com/blog/non-linear-least-squares-fitting-of-a-two-dimensional-data/ - also, for an example of this in practice, see end of this answer.

Then, using the scipy.optimize.curve_fit method, you need to feed it a pair of arrays, one for the x's and one for the y's of the known observations you have collected, against which the fitting will take place.

xdata = [ ... all x values for all observations ... ] 
ydata = [ ... all observed y values in the same order as above ... ]


from scipy.optimize import curve_fit
fitp, fite = curve_fit(mn_elliptical, xdata, ydata))

This will yield fitp the optimal parameters fit by the method, and fite an output describing how much (least squares) error remains after having performed the fit. If your fite values are too big, then it's likely your model function isn't a good one.

You can help guide the process by helping set the expected bounds of the parameters you want to return, and this can speed things up significantly - or, if you've got a skewy function, it can help focus in on the right values that would otherwise get missed. These details are covered in more depth in the linked scipy docs.

Having validated and accepted the amount of error, you can can then retrieve the parameters from fitp and use these to generate additional values of x through your (now fitted) model, and get predicted results.

new_y = mn_elliptical(x, *fitp)

Which will yield a single result - use more advanced numpy/pandas methods to generate multiple results from arrays of x values that you supply.



Just to demonstrate that 2-dimensional use-case, let's imagine a crudely plotted circle, with points A,B,C,D,E at the following xy coordinates (4,1), (6.5,3.5), (4,6), (1.5,3.5), (2,2)

enter image description here

We know that a circle follows the formula (x-cx)^2+(y-cy)^2=r^2, so can write that in a fittable function form:

def circ(xy, cx, cy, r):
    x,y = xy
    return (((x-cx)**2) + ((y-cy)**2))-(r**2)  

Notice that I've flattened the return value to always be zero (at least for values of xy that conform) due to the nature of the formula.

We use the observed data points laid out here:

xdata = np.array([4,6.5,4,1.5,2])
ydata = np.array([1,3.5,6,3.5,2])
zdata = np.array([0,0,0,0,0])

And transform that data based on the method used in the linked article on 2-dimensions.

xdata = np.vstack((xdata.ravel(), ydata.ravel()))
ydata = zdata.ravel()

And then feed this into the 2-d circ function.

curve_fit(circ,xdata,ydata)

This yields:

(array([ 4. ,  3.5, -2.5]), 

array([[ 0., -0., -0.],
       [-0.,  0., -0.],
       [-0., -0.,  0.]]))

The first part of which describes the (cx, cy, r) parameters from my fit circ function, the first two being x,y coordinates of the centre, and the third being the radius of the circle. Based on my pencil on paper drawing, this is pretty much on the money.

The second part describes the errors encountered, which for this hand-drawn example are not horrendous.

Thomas Kimber
  • 10,601
  • 3
  • 25
  • 42