12

I would like to change the metric from RMSE to RMSLE using the

 caret library

Given some sample data:

  ivar1<-rnorm(500, mean = 3, sd = 1)
  ivar2<-rnorm(500, mean = 4, sd = 1)
  ivar3<-rnorm(500, mean = 5, sd = 1)
  ivar4<-rnorm(500, mean = 4, sd = 1)
  dvar<-rpois(500, exp(3+ 0.1*ivar1 - 0.25*ivar2))

  data<-data.frame(dvar,ivar4,ivar3,ivar2,ivar1)



  ctrl <- rfeControl(functions=rfFuncs,
                  method="cv",
                  repeats = 5,
                  verbose = FALSE,
                  number=5)

model <- rfe(data[,2:4], data[,1], sizes=c(1:4), rfeControl=ctrl)

Here I would like to change to RMSLE and keeping the idea of the graph

plot <-ggplot(model,type=c("g", "o"), metric="RMSE")+ scale_x_continuous(breaks = 2:4, labels = names(data)[2:4])
ci_
  • 8,594
  • 10
  • 39
  • 63

1 Answers1

11

Im not sure how / if you can easily convert RMSE to RMSLE, so you can try changing the control function.

Look at rfFuncs$summary it calls a function postResample. This is where the RMSE is calculated - look at the section

mse <- mean((pred - obs)^2)
n <- length(obs)
out <- c(sqrt(mse), resamplCor^2)

So you can amend this function to calculate the RMSLE instead:

msle <- mean((log(pred) - log(obs))^2)
out <- sqrt(msle)
}
names(out) <- "RMSLE"

Then if this amended function has been saved in a function called mypostResample, you then need to update the rfFuncs$summary.


So altogether:

First update the summary function - this will call the new function with RMSLE

newSumm <- function (data, lev = NULL, model = NULL) 
          {
          if (is.character(data$obs)) 
          data$obs <- factor(data$obs, levels = lev)
          mypostResample(data[, "pred"], data[, "obs"])
          }

Then define new function to calculate RMSLE

mypostResample <- function (pred, obs) 
               {
               isNA <- is.na(pred)
               pred <- pred[!isNA]
               obs <- obs[!isNA]

               msle <- mean((log(pred) - log(obs))^2)
               out <- sqrt(msle)
               names(out) <- "RMSLE"

               if (any(is.nan(out))) 
                  out[is.nan(out)] <- NA
               out
               }

Update rfFuncs

# keep old settings for future use
oldSumm <- rfFuncs$summary 

# update with new function
rfFuncs$summary <- newSumm

ctrl <- rfeControl(functions=rfFuncs,
                   method="cv",
                   repeats = 5,
                   verbose = FALSE,
                   number=5)
set.seed(1)
model <- rfe(data[,2:4], data[,1], sizes=c(1:4), rfeControl=ctrl, metric="RMSLE")

# plot
ggplot(model,type=c("g", "o"), metric="RMSLE")+ scale_x_continuous(breaks = 2:4, labels = names(data)[2:4])
user20650
  • 24,654
  • 5
  • 56
  • 91
  • May I ask you one additional question: How would you incorporate the presense of zero`s in your mypostResample function? –  Feb 07 '15 at 12:04
  • I guess you could do `(log(1 + obs) - log(1 + pred))^2` . This is what the [Metrics](http://cran.r-project.org/web/packages/Metrics/index.html) package does Check function `sle`. . That said, perhaps this is a question that is better suited to the statisticians on http://stats.stackexchange.com/ – user20650 Feb 07 '15 at 19:05