-1

I have a training set with a response variable ViolentCrimesPerPop, and I purposely fit a large regression tree with control

control1 <- rpart.control(minsplit=2, cp=1e-8, xval=20)

train_control <- rpart(ViolentCrimesPerPop ~ ., data=train, method='anova', control=control1)

then i use it to predict the testing set

predict1 <- predict(train_control, newdata=test)

however I'm not sure how to compute the mean square error of the test set because it requires the response variable ViolentCrimesPerPop, which is not given in the test set. Can someone give me a hint on how to approach this problem?

PiCubed
  • 375
  • 2
  • 5
  • 11
  • I think you answered your own question. Computing `f(x,y)` requires `x`, which you don't have, so that you can't find `f(x,y)`. – Julius Vainora Oct 23 '18 at 21:39

3 Answers3

1

You can find the MSE only knowing the ground truth. If you don't know the test labels then the only way is to train your model with 70 or 80% of the train data and test the MSE on the other 20/30% of the train data.

Ashok KS
  • 659
  • 5
  • 21
0

You won't be able to calculate the MSE for the test set if you don't know the ground truth (response variable). However, there may be a possibility that you had been asked to split a dataset that contains the ground truth into train and test; in that case, you can easily compute the MSE.

12b345b6b78
  • 995
  • 5
  • 16
0

Are you working on some Kaggle tests that do not provide the response variable for the test set?

Regardless, try to split your training set into new subsets, and use part as training, and the rest to test your model. You cannot assess the model performance without the response variable.

Xiaoyu Lu
  • 3,280
  • 1
  • 22
  • 34