I was able to connect a model trained on Catboost with my bot, but when I enter data for analysis, an error pops up.
raise CatBoostError("Invalid {}[{}] = {} value: index must be < {}.".format(features_name, indx, feature, features_count)) _catboost.CatBoostError: Invalid cat_features[1] = 8 value: index must be < 8.
This is what the model code looks like:
`import pandas as pd
df = pd.read_excel(r'D:\Programming\Telegram_counter_bot\TG_data.xlsx')
df = df.head(1000)
df = df.rename(columns = {'Views (k)':'Views'})
df['Time'].replace('ะก','C', inplace=True)
df = df.fillna(method='ffill')
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size=0.2,random_state=42)
val, test = train_test_split(test, train_size=0.5,random_state=42)
one_hot_tr = pd.get_dummies(train['Time'])
one_hot_t = pd.get_dummies(test['Time'])
one_hot_v = pd.get_dummies(val['Time'])
train = pd.concat([train, one_hot_tr], axis=1)
test = pd.concat([test, one_hot_t], axis=1)
val = pd.concat([val, one_hot_v], axis=1)
X = ['Picture', 'Picture_text', 'Video', 'Header', 'Characters',
'Text formatting', 'Emoji','A',
'B', 'C', 'D']
cat_features = ['A','B', 'C', 'D']
y = ['Views']
from catboost import CatBoostRegressor
from catboost import Pool
train_data = Pool(data=train[X],
label=train[y],
cat_features=cat_features
)
valid_data = Pool(data=val[X],
label=val[y],
cat_features=cat_features
)
train_full = pd.concat([train,val])
train_full_data = Pool (train_full[X],
label=train_full[y],
cat_features=cat_features)
params = {'iterations' : 582,
'eval_metric': 'MAE',
'loss_function': 'MAE',
'random_seed': 42,
'verbose': 100,
'learning_rate': 0.01}
model = CatBoostRegressor(**params)
model.fit(train_full_data)
y_pred = model.predict(test[X])`
And here is the function in the bot that is responsible for it:
`def process_user_data(params):
data = pd.DataFrame([params], columns=['Picture', 'Picture_text', 'Video', 'Header', 'Characters',
'Text formatting', 'Emoji', 'Time'])
one_hot = pd.get_dummies(data['Time'])
data = pd.concat([data, one_hot], axis=1)
data = data.drop('Time', axis=1)
return data`
I tried to replace A,B,C,D with indices in cat_features. I tried to enter data into the bot using different approaches, but the format / analyze 1; 0; 0; 1; 159; 1; 1; D
turned out to be the most correct.