I am trying to determine the best value of spar to implement across a dataset by reducing the root mean square error between test and training replicates on the same raw data. My test and training replicates look like this:
Traindataset
t = -0.008 -0.006 -0.004 -0.002 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02 0.022 0.024
dist = NA 0 0 0 0 0.000165038 0.000686934 0.001168098 0.001928885 0.003147262 0.004054971 0.005605361 0.007192645 0.009504648 0.011498809 0.013013655 0.01342625
Testdataset
t = -0.008 -0.006 -0.004 -0.002 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02 0.022 0.024
dist = NA 0 0 0 0 0 0.000481184 0.001306409 0.002590156 0.003328259 0.004429246 0.005012768 0.005829698 0.006567801 0.008030102 0.009617453 0.011202827
I need the spline to be 5th order so I can accurately predict the 3rd derivative, so I am using smooth.Pspline (from the pspline package) instead of the more common smooth.spline. I attempted using a variant of the solution outlined here (using root mean squared error of predicting testdataset from traindataset instead of cross validation sum of squares within one dataset). My code looks like this:
RMSE <- function(m, o){
sqrt(mean((m - o)^2))
}
Psplinermse <- function(spar){
trainmod <- smooth.Pspline(traindataset$t, traindataset$dist, norder = 5,
spar = spar)
testpreddist <- predict(trainmod,testdataset$t)[,1]
RMSE(testpreddist, testdataset$dist)
}
spars <- seq(0, 1, by = 0.001)
rmsevals <- rep(0, length(spars))
for (i in 1:length(spars)){
rmsevals[i] <- Psplinermse(spars[i])
}
plot(spars, rmsevals, 'l', xlab = 'spar', ylab = 'RMSE' )
The issue I am having is that for pspline, the values of RMSE are the same for any spar above 0 graph of spar vs RMSE. When I dug into just the predictions line of code, I realized I am getting the exact same predicted values of dist for any spar above 0. Any ideas on why this might be are greatly appreciated.