I'm trying to apply CatBoost to one of my columns for categorical features but get following error:
CatBoostError: Invalid type for cat_feature[non-default value idx=0,feature_idx=2]=68892500.0 : cat_features must be integer or string, real number values and NaN values should be converted to string.
I could use one-hot encoding but many on here say CatBoost seems to better at handling this and less prone to overfitting the model.
My data consists of three columns, 'Country', 'year', 'phone users'. Target is 'Country' and 'year' and 'phone users' are Feature.
Data:
Country year phone users
Ireland 1989 978
France 1990 854
Spain 1991 882
Turkey 1992 457
... ... ...
My code so far:
X = df.loc[115:305]
y = df.loc[80:, 0]
cat_features = list(range(0, X_pool.shape[1]))
Output: [0, 1, 2]
X_train, X_val, y_train, y_val = train_test_split(X_pool, y_pool,
test_size=0.2, random_state=0)
cbc = CatBoostClassifier(iterations=5, learning_rate=0.1)
cbc.fit(X_train, y_train, eval_set=(X_val, y_val),
cat_features=cat_features, verbose=False)
print("Model Evaluation Stage")
Do I need to run LabelEncoder before fitting to catboost model? What am I missing here?