0

I'm Performing a K-fold exercise with K = 10 for polinomials from degree 1 to 5 with the purpose of identifying which polynomial fits the best the data provided. Never the less, when I try to predict Y-Hat using the testing data (X-test) which length is 32. R shows me a warning letting me know that the predictions have been adjusted to the length of the training data that has 288 and I don't really understand why this happens.

What I believe is that after fitting the gml and then predicting I should get the 32 y predicted for the 32 points included in the x-test set.

"...Warning: 'newdata' had 32 rows but variables found have 288 rowsWarning: 'newdata' had 32 rows but variables found have 288 rowsWarning: 'newdata' had 32 rows but variables found have 288 rowsWarning..."

Here is my code:

    k = 10
    CVMSE = matrix(NA, nrow = k, ncol = 5)

    set <- 1:320
    random_x = sample(train_x, size = length(train_x))
    random_y = sample(train_noisy_y, size = length(train_noisy_y))
    
    n <- length(train_x)
    k <- 10
    group_sizes_x <- rep(floor(n/k), k)


    groups_x <-split(random_x, rep(1:k,group_sizes_x))
  
    n <- length(train_noisy_y)
    k <- 10
    group_sizes_y <- rep(floor(n/k), k)
    groups_y <-split(random_y, rep(1:k,group_sizes_y))

    for (deg in 1:5) {
  
      for (i in 1:k){
    
        x_test <- groups_x[[i]] %>% unlist()
        y_test <- groups_y[[i]] %>% unlist()
        x_train <- groups_x[-i] %>% unlist()
        y_train <- groups_y[-i] %>% unlist()

        model <- glm(y_train ~ poly(x_train, deg))
        y_pred <- predict.glm(model, newdata = data.frame(x = x_test))
        CVMSE[i, deg] <- mean((y_test - y_pred)^2)
     }}

meanCVMSE = apply(CVMSE, 2, mean)
meanCVMSE

At the end I get the meanCVMSE but with the warning I mentioned before.

Progman
  • 16,827
  • 6
  • 33
  • 48
Lucpi
  • 1

0 Answers0