I have following data. My goal is to calculate mean squared prediction error (MSPE)using cross validation by replicating 100 times.
y=rnorm(250,0,2)
x1=rnorm(250,0,3)
x2=rnorm(250,1,1)
datasim=data.frame(y,x1,x2)
From this first i need to split the data in to training and test data. So i first calculated the indices using sample.int function in R. Based on those indices i split the data into training and test set.
dd=replicate(100,sample.int(n = nrow(datasim),
size = floor(.75*nrow(datasim)), replace = F))
train_set=apply(dd,2,function(y)
datasim[y, ])
test_set=apply(dd,2,function(y)
datasim[-y, ])
After that i have to use the training data to fit the model. And based on test data , i need to predict and obtain mean squared prediction error (MSPE). I don't know how to proceed from here. Especially i dont know how to link the training set and test set so that i can predict and calculate MSPE.
I tried it using this using a lapply function which is inside of another lapply function.
lapply(test_set, function(train_set) {
lapply(train_set,function(x)
mean((test_set$y- predict.lm(y ~ x1 + x2, data = train_set))^2)
}
))
But there seems to be a problem with this.Can anyone help me to figure out this ? Also is there any easier method to do this than this method ?
Thank you