2

I'm trying to use the MAPE as metric to evaluate the performance of a model.

In the case of LOOCV and parallel execution all works properly but If I use another resampling method I get this error:

Error in { : task 1 failed - “could not find function ”mape“”

Instead in serial execution this issue disappears.

The code below provides an example.

    library(caret)
    library(doParallel)

    data("environmental")

    registerDoParallel(makeCluster(detectCores(), outfile = ''))



    mape <- function(y, yhat) mean(abs((y - yhat)/y))

    mapeSummary <- function (data, lev = NULL, model = NULL) {

                       out <- mape(data$obs, data$pred)
                       names(out) <- "MAPE"

                       out
                     }



    #LOOCV - parallel
    trControlLoocvPar <- trainControl(allowParallel = T,
                                      verboseIter = T, 
                                      method = "LOOCV",
                                      summaryFunction = mapeSummary)

    #LOOCV - serial
    trControlLoocvSer <- trainControl(allowParallel = F,
                                      verboseIter = T, 
                                      method = "LOOCV",
                                      summaryFunction = mapeSummary)

    #Bootstrapping - parallel
    trControlBootPar <- trainControl(allowParallel = T,
                                      verboseIter = T, 
                                      method = "boot",
                                      summaryFunction = mapeSummary)

    #Bootstrapping - serial
    trControlBootSer <- trainControl(allowParallel = F,
                                      verboseIter = T, 
                                      method = "boot",
                                      summaryFunction = mapeSummary)


    trControlList <- list(trControlLoocvSer, 
                          trControlLoocvPar,
                          trControlBootSer,
                          trControlBootPar)


    models <- lapply(trControlList, 
                     function(control) {

                       train(y = environmental$ozone,
                       x = environmental[, -1], 
                       method = "glmnet", 
                       trControl = control, 
                       metric = "MAPE", 
                       maximize = FALSE)
                     })

My OS is El Capitan 10.11.4 and the version of caret is 6.0.62.

amarchin
  • 2,044
  • 1
  • 16
  • 32

1 Answers1

2

As the message states, your parallel proces can not find the mape function.

The easiest solution is to put the mape function in the mapeSummary function like below. Then your parallel processes will work correctly.

mapeSummary <- function (data, lev = NULL, model = NULL) {
  mape <- function(y, yhat) mean(abs((y - yhat)/y))
  out <- mape(data$obs, data$pred)
  names(out) <- "MAPE"

  out
}

bonus:

You can also make use of the clusterEvalQ function, one of the clusterApply functions. This works like below, but is not the most elegant solution and requires more typing:

cl <- makePSOCKcluster(detectCores()-1)
clusterEvalQ(cl, mape <- function(y, yhat) mean(abs((y - yhat)/y)))
registerDoParallel(cl)

mapeSummary <- function (data, lev = NULL, model = NULL) {
   out <- mape(data$obs, data$pred)
  names(out) <- "MAPE"
  out
}

#Bootstrapping - parallel
trControlBootPar <- trainControl(allowParallel = T,
                                 verboseIter = T, 
                                 method = "boot",
                                 summaryFunction = mapeSummary)

train(y = environmental$ozone,
      x = environmental[, -1], 
      method = "glmnet", 
      trControl = trControlBootPar, 
      metric = "MAPE", 
      maximize = FALSE)

stopCluster(cl)
registerDoSEQ()
phiver
  • 23,048
  • 14
  • 44
  • 56