Fitting a higher degree function using PolynomialFeatures and LinearRegression

Question

In a book I have found the following code which fits a LinearRegression to quadratic data:

m = 100
X = 6 * np.random.rand(m, 1) - 3
y = 0.5 * X**2 + X + 2 + np.random.randn(m, 1)
poly_features = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly_features.fit_transform(X)
lin_reg = LinearRegression()
lin_reg.fit(X_poly, y)

But how could that be? I know from the documentation that PolynomialFeatures(degree=2, include_bias=False) is creating an array which looks like:

[[X[0],X[0]**2]
[X[1],X[1]**2]
.....
[X[n],X[n]**2]]

BUT: How is the LinearRegression able to fit this data? Means WHAT is the LinearRegression doing and what is the concept behind this.

I am grateful for any explanations!

Miriam Farber · Accepted Answer · 2017-07-13T22:38:01.370

PolynomialFeatures with degree two will create an array that looks like:

   [[1, X[0], X[0]**2]
    [1, X[1], X[1]**2]
    .....
    [1, X[n] ,X[n]**2]]

Let's call the matrix above X. Then the LinearRegression is looking for 3 numbers a,b,c so that the vector

X* [[a],[b],[c]] - Y

has the smallest possible mean squared error (which is just the mean of the sum of the squares in the vector above).

Note that the product X* [[a],[b],[c]] is just a product of the matrix X with the column vector [a,b,c].T . The result is a vector of the same dimension as Y.

Regarding the questions in your comment:

This function is linear in the new set of features: x, x**2. Just think about x**2 as an additional feature in your model.
For the particular array mentioned in your question, the LinearRegression method is looking for numbers a,b,c that minimize the sum

(a*1+bX[0]+cX[0]**2-Y[0])**2+(a*1+bX[1]+cX[1]**2-Y[1])**2+..+(a*1+bX[n]+cX[n]**2-Y[n])**2

So it will find a set of such numbers a,b,c. Hence the suggested function y=a+b*x+c*x**2 is not based only on the first row. Instead, it is based on all the rows, because the parameters a,b,c that are chosen are those that minimize the sum above, and this sum involves elements from all the rows.

Once you created the vector x**2, the linear regression just regards it as an additional feature. You can give it a new name v=x**2. Then the linear regression is of the form y=a+b*x+c*v, which means, it is linear in x and v. The algorithm does not care how you created v. It just treats v as an additional feature.

Ok thanks. Now lets say, the LinearRegression function has found the optimal parameters with a=1, b=2 and c=3, than the function for the first row becomes: y= 3x**2+2x+1. And now?? 1. What is the LinearRegression doing because this function is not linear..... 2. Further, if the LinearRegression is doing this for each row in the array, is it right that in a n*m array, n linear regressions are computed? And 3. I still dont't get how a linear regression can get a curved shape??? — 2Obe, Jul 13 '17 at 22:27
Additional feature means an additional axis right? Thus the LinearRegression curve in a two dimensional coordinate system could look like a curve but actually it is still a straight line but in a higher dimensional space? — 2Obe, Jul 13 '17 at 22:50

Fitting a higher degree function using PolynomialFeatures and LinearRegression

1 Answers1