calculate MSE for training set that's missing the response variable

Question

I have a training set with a response variable ViolentCrimesPerPop, and I purposely fit a large regression tree with control

control1 <- rpart.control(minsplit=2, cp=1e-8, xval=20)

train_control <- rpart(ViolentCrimesPerPop ~ ., data=train, method='anova', control=control1)

then i use it to predict the testing set

predict1 <- predict(train_control, newdata=test)

however I'm not sure how to compute the mean square error of the test set because it requires the response variable ViolentCrimesPerPop, which is not given in the test set. Can someone give me a hint on how to approach this problem?

I think you answered your own question. Computing `f(x,y)` requires `x`, which you don't have, so that you can't find `f(x,y)`. — Julius Vainora, Oct 23 '18 at 21:39

score 1 · Answer 1 · answered Oct 30 '18 at 13:11

1

You can find the MSE only knowing the ground truth. If you don't know the test labels then the only way is to train your model with 70 or 80% of the train data and test the MSE on the other 20/30% of the train data.

answered Oct 30 '18 at 13:11

Ashok KS

659
5
21

score 0 · Answer 2 · answered Oct 23 '18 at 21:39

You won't be able to calculate the MSE for the test set if you don't know the ground truth (response variable). However, there may be a possibility that you had been asked to split a dataset that contains the ground truth into train and test; in that case, you can easily compute the MSE.

score 0 · Answer 3 · answered Oct 23 '18 at 21:39

Are you working on some Kaggle tests that do not provide the response variable for the test set?

Regardless, try to split your training set into new subsets, and use part as training, and the rest to test your model. You cannot assess the model performance without the response variable.

calculate MSE for training set that's missing the response variable

3 Answers3