1

I would like to evaluate the performance of a GAM at predicting novel data using a five-fold cross-validation. Model training is based on a random subset of 80% of the data and the test set the remaining 20%. I can calculate mean square prediction error between the training and test data, but am uncertain how to implement this across k-folds. I have the following code for training and test datasets and to calculate MSPE. I have not included sample data, but can do so.

indexes<-sample(1:nrow(data),size=0.2*nrow(data))
testP<-data[indexes,] #20%
trainP<-data[-indexes,]#80%
gam0<-gam(x~ NULL,family=quasibinomial(link='logit'),data=data,gamma=1.4)
pv<-predict(gam0,newdata=testP,type="response")
diff<-pv-testP$x #(predicted - observed)
diff2<-diff^2 #(predicted - observed)^2
mspegam0<-mean(diff2) 
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
akbreezo
  • 115
  • 1
  • 1
  • 10

0 Answers0