0

I am trying to perform a 5-fold cross-validation for a model I estimated with the gamlss package. When I use the same code and estimate another model (e.g. OLS), I have no problems. However, when I change the model to a gamlss, I get an error message.

Here is an illustative example:

# load packages and data
library(caret)
library(gamlss)
data(usair)

# create 5 folds
folds <- createFolds(usair$y, k = 5)

When I run this code, everything works fine and I get a list containing my performance measures for each fold:

### 1) OLS
# estimate model 5 times and get performance measures
res1 <- lapply(folds, function(x) {
  # Create training and test data set
  trainset <- usair[-x, ]
  testset <- usair[x, ]
  # estimate the model with the training data set
  m1<- lm(y~ x1 + x2 + x3 + x4 + x5 + x6,
              data=trainset)
  # predict outcomes with the test data set
  y_pred <- predict(m1, newdata = testset)
  # store the actual outcome values in a vector
  y_true <- testset$y
  # Store performance measures
  MAE <-  sum(abs(y_true-y_pred))/length(y_true) # Mean Absolute Error
  MSE <- sum((y_true-y_pred)^2)/length(y_true) # Mean Squared Error  
  MAPE <- 100*sum(abs(y_true-y_pred)/y_true)/length(y_true) # Mean Absolute Percentage Error
  R2 <- 1-MSE/var(y_true)
  list(MAE=MAE,
       MSE=MSE,
       MAPE=MAPE,
       R2= R2)
})

However, when I run this code, changing the model type to a gamlss, I get an error message:

### 2) gamlss
# estimate model 5 times and get performance measures
res2 <- lapply(folds, function(x) {
  # Create training and test data set
  trainset <- usair[-x, ]
  testset <- usair[x, ]
  # estimate the model with the training data set
m1<- gamlss(y~ri(x.vars=c("x1","x2","x3","x4","x5","x6"), Lp =1),
            data=trainset)
# predict outcomes with the test data set
y_pred <- predict(m1, newdata = testset)
# store the actual outcome values in a vector
y_true <- testset$y
# Store performance measures
MAE <-  sum(abs(y_true-y_pred))/length(y_true) # Mean Absolute Error
MSE <- sum((y_true-y_pred)^2)/length(y_true) # Mean Squared Error  
MAPE <- 100*sum(abs(y_true-y_pred)/y_true)/length(y_true) # Mean Absolute Percentage Error
R2 <- 1-MSE/var(y_true)
list(MAE=MAE,
     MSE=MSE,
     MAPE=MAPE,
     R2= R2)
})

The error message is: "Error in eval(substitute(Data)) : object 'trainset' not found". I've run the code in the function separately for each fold and it works. It seems like the train- and test set can suddenly not be created anymore. However, all I did was changing the model.

Does anyone have an idea what could be the problem here?

Nerd
  • 61
  • 5

1 Answers1

1

You need to specify the formula parameter like this:

res2 <- lapply(folds, function(x) {
  # Create training and test data set
  trainset <- usair[-x, ]
  testset <- usair[x, ]
  # estimate the model with the training data set
  m1<- gamlss(formula=y~ri(x.vars=c("x1","x2","x3","x4","x5","x6"), Lp =1),
              data=trainset)
  # predict outcomes with the test data set
  y_pred <- predict(m1, newdata = testset)
  # store the actual outcome values in a vector
  y_true <- testset$y
  # Store performance measures
  MAE <-  sum(abs(y_true-y_pred))/length(y_true) # Mean Absolute Error
  MSE <- sum((y_true-y_pred)^2)/length(y_true) # Mean Squared Error  
  MAPE <- 100*sum(abs(y_true-y_pred)/y_true)/length(y_true) # Mean Absolute Percentage Error
  R2 <- 1-MSE/var(y_true)
  list(MAE=MAE,
       MSE=MSE,
       MAPE=MAPE,
       R2= R2)
})

Results:

> # estimate model 10 times and get performance measures
> res2 <- lapply(folds, function(x) {
+   # Create training and test data set
+   trainset <- usair[-x, ]
+   testset <- usair[x, ]
+   # estimate the model with the training data set
+   m1<- gamlss(formula=y~ri(x.vars=c("x1","x2","x3","x4","x5","x6"), Lp =1),
+               data=trainset)
+   # predict outcomes with the test data set
+   y_pred <- predict(m1, newdata = testset)
+   # store the actual outcome values in a vector
+   y_true <- testset$y
+   # Store performance measures
+   MAE <-  sum(abs(y_true-y_pred))/length(y_true) # Mean Absolute Error
+   MSE <- sum((y_true-y_pred)^2)/length(y_true) # Mean Squared Error  
+   MAPE <- 100*sum(abs(y_true-y_pred)/y_true)/length(y_true) # Mean Absolute Percentage Error
+   R2 <- 1-MSE/var(y_true)
+   list(MAE=MAE,
+        MSE=MSE,
+        MAPE=MAPE,
+        R2= R2)
+ })
GAMLSS-RS iteration 1: Global Deviance = 281.937 
GAMLSS-RS iteration 2: Global Deviance = 281.9348 
GAMLSS-RS iteration 3: Global Deviance = 281.9348
AugtPelle
  • 549
  • 1
  • 10
  • Thanks for the reply. Unfortunately, that still didn't fix the problem. – Nerd Jan 28 '22 at 08:21
  • 1
    I edit the message showing the results. Let me know if you are getting the same ones. – AugtPelle Jan 28 '22 at 14:54
  • No, I get the same error message: "Error in eval(substitute(Data)) : object 'trainset' not found". Thank you for posting your results, though. It is good to know that it works for you. Maybe it has to do with my environment.. I still haven't figured out the problem. – Nerd Jan 31 '22 at 10:35
  • 1
    Did you try cleaning the entire environment and running from scratch ? I'm a bit clueless now, because I can guarantee you this work. I use linux, RStudio and R version 3.6.2 – AugtPelle Jan 31 '22 at 23:56