I'm trying to learn the basics of XGBoost and devises a script that splits some data I found on Kaggle about Corona virus outbreaks in China. The code and model work, but some some reason when I use the model to make a new prediction I get a "ValueError: feature_names mismatch." The new test data has a 2-d array with 2 values, just like the test data, but I still get a value error.
train = df[['RegionCode','ProvinceCode']].astype(int)
test = df['infected'].astype(int)
X_test, X_train, y_test, y_train = train_test_split(train, test, test_size=0.2, random_state=42)
train = xgb.DMatrix(X_train, label=y_train)
test = xgb.DMatrix(X_test, label=y_test)
param = {
'max_depth':4,
'eta':0.3,
'num_class': 2}
epochs = 10
model = xgb.train(param, train, epochs)
All the code above works, but the terst below gives me the error:
testArray=np.array([[13, 67]])
test_individual = xgb.DMatrix(testArray)
print(model.predict(test_individual))
Any idea what I'm doing wrong?