0

I have code:

from sklearn.model_selection import KFold, cross_val_predict
from catboost import Pool, CatBoostRegressor, cv
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")
data = pd.read_csv("winemag-data-130k-v2.csv")
data = data.dropna(subset = ['price', 'country'])
x = data.drop(columns = ['points']).fillna(0)
y = data['points']
categorical_features = np.where(x.dtypes == object)[0]
model = CatBoostRegressor(random_seed = 400, iterations = 400)
model.fit(x, y, cat_features = categorical_features)
y_pred = cross_val_predict(model, x, y, cv = KFold(n_splits = 10, shuffle = True, random_state = 0))
print("MSE: {:.2f}".format(mean_squared_error(y, y_pred)))

An error occurs while executing y_pred:

CatBoostError: Bad value for num_feature[non_default_doc_idx=0,feature_idx=1]="Portugal": Cannot convert 'b'Portugal'' to float

What am I doing wrong?

Nikolay
  • 11
  • 3
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Apr 03 '23 at 01:52

0 Answers0