0

I want to run sklearn.ensemble.BaggingClassifier with CatBoostClassifier as estimator but the thing is there are some categorical features and this cause such error:

CatBoostError: 'data' is numpy array of floating point numerical type, it means no categorical features, but 'cat_features' parameter specifies nonzero number of categorical features

clf = CatBoostClassifier(task_type='GPU',
                         n_estimators=8000,
                         early_stopping_rounds=5,
                         verbose=250,
                         cat_features=['avg_day_of_week', 'avg_month', 'mode_subgroup', 'mode_small_group']
                        ) 

from sklearn.ensemble import BaggingClassifier
bag_clf = BaggingClassifier(clf, n_estimators=10, max_samples=0.8)
bag_clf.fit(X_train.drop('client_id', axis=1), y_train)

Is it possible to overcome this problem?

1 Answers1

0

from catboost official documentation: why catboost cant handle NaNs from float type as categorical features:

a simple workaround for your case would be to convert your numpy matrix as pd.dataframe:

bag_clf.fit(pd.DataFrame(X_train.drop('client_id', axis=1)), y_train)
Andreyn
  • 304
  • 5
  • 14