14

Recently I started to learn sklearn, numpy and pandas and I made a function for multivariate linear regression. Im wondering, is it possible to make multivariate polynomial regression?

This is my code for multivariate polynomial regression, it shows this error:

in check_consistent_length " samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [8, 3]

Do you know whats the problem?

import numpy as np
import pandas as pd
import xlrd
from sklearn import linear_model
from sklearn.model_selection import train_test_split

def polynomial_prediction_of_future_strenght(input_data, cement, blast_fur_slug,fly_ash,
                                              water, superpl, coarse_aggr, fine_aggr, days):

    variables = prediction_accuracy(input_data)[4]
    results = prediction_accuracy(input_data)[5]

    var_train, var_test, res_train, res_test = train_test_split(variables, results, test_size = 0.3, random_state = 4)

    Poly_Regression = PolynomialFeatures(degree=2)
    poly_var_train = Poly_Regression.fit_transform(var_train)
    poly_var_test = Poly_Regression.fit_transform(var_test)

    input_values = [cement, blast_fur_slug, fly_ash, water, superpl, coarse_aggr, fine_aggr, days]

    regression = linear_model.LinearRegression()
    model = regression.fit(poly_var_train, res_train)

    predicted_strenght = regression.predict([input_values])
    predicted_strenght = round(predicted_strenght[0], 2)

    score = model.score(poly_var_test, res_test)
    score = round(score*100, 2)


    print(prediction, score)

a = polynomial_prediction_of_future_strenght(data_less_than_28days, 260.9, 100.5, 78.3, 200.6, 8.6, 864.5, 761.5, 28)
taga
  • 3,537
  • 13
  • 53
  • 119

2 Answers2

18

You can transform your features to polynomial using this sklearn module and then use these features in your linear regression model.

from sklearn.preprocessing import PolynomialFeatures
from sklearn import linear_model

poly = PolynomialFeatures(degree=2)
poly_variables = poly.fit_transform(variables)

poly_var_train, poly_var_test, res_train, res_test = train_test_split(poly_variables, results, test_size = 0.3, random_state = 4)

regression = linear_model.LinearRegression()

model = regression.fit(poly_var_train, res_train)
score = model.score(poly_var_test, res_test)

Also, in your code you are training your model on the entire dataset and then you split it into train and test. This means that your model has already seen your test data while training. You need to split first, then train your model only on training data and then test the score on the test set. I have included these changes as well. :)

aunsid
  • 397
  • 2
  • 10
panktijk
  • 1,574
  • 8
  • 10
  • Thanks my friend, but I didnt understand you this: "in your code you are training your model on the entire dataset and then you split it into train and test. This means that your model has already seen your test data while training." . Is something wrong with the code that I posted in question? – taga Feb 26 '19 at 22:08
  • When you train your model on a piece of data, you have to make sure that it will work for other unseen data as well. That is why we first split our dataset into train and test. So that when we can train it on training dataset and check how it performs on test data (which it does not encounter while training). You are training your model before splitting, which means while training it encounters all the data. Your `model.score(var_test, res_rest)` will not be an accurate measure to evaluate model performance. – panktijk Feb 26 '19 at 22:19
  • Thanks, I understand that know, but I still have a problem with my multivariate regression code, please check out the question, I have updated it – taga Feb 26 '19 at 22:36
  • Looks like you might have to reshape your input data. Where exactly do you get the error? Try to check `your_data.shape` and if it's something like `(n,)` then you will have to do `your_data.reshape((n, 1))` – panktijk Feb 26 '19 at 22:54
  • I get an error in the last line of code, when I want to call the function. I get my data from excel file with 9 columns (8 with parameters and 1 with result), then I read it with pandas. Data that I pass in function as input_data works for function that I use multivariate linear regression. – taga Feb 26 '19 at 23:10
  • Can you check what values you get for `poly_var_train.shape` and `poly_var_test.shape`? I am guessing the incosistency is after you transform your features to polynomial. – panktijk Feb 26 '19 at 23:32
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/189098/discussion-between-aleksandar-and-panktijk). – taga Feb 26 '19 at 23:33
0

Not quite clear what you mean by "is it possible to make multivariate polynomial regression", but a pre-made, non-sklearn solution is available in the localreg Python library (full disclosure: I made it).

sigvaldm
  • 564
  • 4
  • 15