0

For my regression problem, I am using GridSearchCV of scikit-learn to get the best alpha value and using this alpha value in my estimator (Lasso, Ridge, ElasticNet). My target values in the training dataset do not contain any negative values. But some of the predicted values are negative (around 5-10%). I am using the following code. My training data contains some Null values and I am replacing them by mean of that feature.

return Lasso(alpha=best_parameters['alpha']).fit(X,y).predict(X_test)

Any idea why am I getting some as Negative values ? Shape of X,y and X_test are (20L, 400L) (20L,) (10L, 400L)

user644745
  • 5,673
  • 9
  • 54
  • 80

1 Answers1

3

Lasso is just regularized linear regression so in fact for each trained model there are some values for which the predictor will be negative.

consider a linar function

f(x) = w'x + b

Where w and x are vectors and ' is transposition operator

No matter what are the values of w and b, as long as w is not a zero vector - there are always values of x for which f(x)<0. And it does not matter that your training set used to compute w and b did not contain any negative values, as the linear model will always (possibly in some really big values) cross the 0 value.

lejlot
  • 64,777
  • 8
  • 131
  • 164
  • Thanks for the explanation. Any idea what would be a better estimator when my features (original 1500, reduced after feature selection to 400) are more than samples ? Used AdaBoostRegressor with DecisionTreeRegressor with n_estimators=300, did not improve the score much, but at least did not get any negative value. – user644745 Nov 17 '13 at 17:04
  • 1
    To be honest, the best solution is to gather more data. WIth such small dataset you cannot expect any reasonable results. – lejlot Nov 17 '13 at 17:34