I'm Performing a K-fold exercise with K = 10 for polinomials from degree 1 to 5 with the purpose of identifying which polynomial fits the best the data provided. Never the less, when I try to predict Y-Hat using the testing data (X-test) which length is 32. R shows me a warning letting me know that the predictions have been adjusted to the length of the training data that has 288 and I don't really understand why this happens.
What I believe is that after fitting the gml and then predicting I should get the 32 y predicted for the 32 points included in the x-test set.
"...Warning: 'newdata' had 32 rows but variables found have 288 rowsWarning: 'newdata' had 32 rows but variables found have 288 rowsWarning: 'newdata' had 32 rows but variables found have 288 rowsWarning..."
Here is my code:
k = 10
CVMSE = matrix(NA, nrow = k, ncol = 5)
set <- 1:320
random_x = sample(train_x, size = length(train_x))
random_y = sample(train_noisy_y, size = length(train_noisy_y))
n <- length(train_x)
k <- 10
group_sizes_x <- rep(floor(n/k), k)
groups_x <-split(random_x, rep(1:k,group_sizes_x))
n <- length(train_noisy_y)
k <- 10
group_sizes_y <- rep(floor(n/k), k)
groups_y <-split(random_y, rep(1:k,group_sizes_y))
for (deg in 1:5) {
for (i in 1:k){
x_test <- groups_x[[i]] %>% unlist()
y_test <- groups_y[[i]] %>% unlist()
x_train <- groups_x[-i] %>% unlist()
y_train <- groups_y[-i] %>% unlist()
model <- glm(y_train ~ poly(x_train, deg))
y_pred <- predict.glm(model, newdata = data.frame(x = x_test))
CVMSE[i, deg] <- mean((y_test - y_pred)^2)
}}
meanCVMSE = apply(CVMSE, 2, mean)
meanCVMSE
At the end I get the meanCVMSE but with the warning I mentioned before.