Hi I get the following error;
Error in predict.randomForest(classifier, newdata = grid_set) :
variables in the training data missing in newdata
When I type in the following code;
classifier = randomForest(x = training_set[-3],
y = training_set$Purchased,
ntree = 10)
set = training_set[-3]
X1 = seq(min(set[, 1]) - 1, max(set[, 1]) + 1, by = 0.01)
X2 = seq(min(set[, 2]) - 1, max(set[, 2]) + 1, by = 0.01)
grid_set = expand.grid(X1, X2)
colnames(grid_set) = c('Age', 'Estimated Salary')
ygrid = predict(classifier, newdata = grid_set)
The issue is there is a 3rd column that is a categorical variable that I thought I had removed by running the code training_set[-3]. Does this not remove that column? Simply adding another layer to my gridset 'X3' referring to the purchased column did not solve the issue either.
I am wondering whether I simply need another method of removing the purchased column from x in the training set data or whether I am going wrong elsewhere