Why is pipeline throwing FitFailedWarining when I try LabelEncoder on my pipeline?

Question

I am new at machine learning and trying to make a project to keep me busy, so I don't know a lot of how the sklearn works. The main objective is to train a model to predict a categorical variable. When I tried labelEncoding the y variable of my model I get the following error:

ValueError: not enough values to unpack (expected 3, got 2)

  FitFailedWarning)

Here is the code I am using

#Rough training

cols_to_use = [col for col in formatData.columns if col not in 'type1']
x = formatData[cols_to_use]
y = formatData.type1
#print(x.columns)
#print(y)


numerical_transformer = SimpleImputer(strategy='constant')
categorical_tansformer = Pipeline(steps=[
                                        ('imputer', SimpleImputer(strategy='most_frequent')),
                                        ('label', LabelEncoder())
                                        ])


preprocessor = ColumnTransformer(transformers=[('num',numerical_transformer),('cat',categorical_tansformer)])

my_pipeline = Pipeline(steps=[('preprocessor',preprocessor),
                              ('model',RandomForestRegressor(n_estimators=50,random_state=0))])

from sklearn.model_selection import cross_validate
from sklearn.model_selection import cross_val_predict

cv_results = cross_validate(my_pipeline,x,y,cv=5,scoring=('r2','neg_mean_absolute_error'))

predictions = cross_val_predict(my_pipeline,x,y,cv=5)
print(cv_results['test_neg_mean_absolute_error'])
print(predictions)

Any help is appreciated, if you need any more information, please comment.

score 0 · Answer 1 · answered Aug 10 '20 at 01:38

Pipelines are designed to transform X, not y. (There's some discussion around this, especially e.g. in resamplers that should change rows of X and y together; see imblearn for a fix in at least that direction.)

In particular, fit_transform(X, y) has a default definition as fit(X, y).transform(X). So LabelEncoder in a pipeline will try to transform X, and will fail because it doesn't know what to do with 2D input. You should just label encode y outside of the pipeline.

Why is pipeline throwing FitFailedWarining when I try LabelEncoder on my pipeline?

1 Answers1