0

I doing polynomial regression with scikit learn and try to interpret the coefficients. But somehow scikit doesn't format the output. So it looks like this:

[ 0.,0.95545289,0.,0.20682341,-0.,0.,-0.,-0.,0.,0.,0.,-0.,0.,-0.,-0.,]

How can I map the coefficients to the features which where created? Code I have so far:

poly = PolynomialFeatures(interaction_only=True)
X_ = poly.fit_transform(X_train_minmax)
X_test1 = poly.fit_transform(X_test_minmax)

lasso_model = linear_model.LassoCV(cv = 10, copy_X = True, normalize = False)
lasso_fit = lasso_model.fit(X_, y_train)
lasso_path = lasso_model.score(X_, y_train)
y_pred= lasso_model.predict(X_test1)
lasso_model.coef_

Thx!

Alanovic
  • 51
  • 1
  • 7

2 Answers2

0

According to the docs in PolynomialFeatures:

powers_[i, j] is the exponent of the jth input in the ith output.

So something like this should work:

columns = ['_'.join(['x{var}^{exp}'.format(var=var, exp=exp) for var, exp in enumerate(a[i, :])]) for i in range(a.shape[0])
zip(columns, lasso_model.coef_)

The important line is the first one. :)

dukebody
  • 7,025
  • 3
  • 36
  • 61
0

Let us assume you are running a 2nd degree polynomial regression. So,

poly = PolynomialFeature(degree =2)  #generate a polynomial object
X_ = poly.fit_transform(input_data)  #ndarray to be used for regression.

where input_data = [X1,X2,X3,...] #actually ndarray represented as a List for simplicity

To find the index in the list Lasso.coef_ where (say) X1 factors are present,i.e, X1, X1**2, X1*X2, X1*X3,...X1*Xn, use the following

list_of_index = []

for j in range(len(input_data)):#iterate over each input, X1, X2, etc temp =[] for i in X_.shape[1]:#iterate over the polynomial ndarray object columnwise if poly.powers_[i,j] != 0: temp.append(i) list_of_index.append(temp) list_of_index will be a list of lists containing index of the positions which have factors of X1, X2, etc..

Example:

For a 2nd degree regression using only X1 and X2, the generated ndarray will be [1 , X1, X2, X1**2, X1*X2, X2**2]

list_of_lists would be [[1,3,4],[2,4,5]]

You could use this to access lasso_model.coef_