9

I'm fitting a random forest using the R package ranger to classify a raster image. The prediction function produces an error and hereafter I provide a reproducible example.

library(raster)
library(nnet)
library(ranger)
data(iris)

# put iris data into raster
r<-list()
for(i in 1:4){
  r[[i]]<-raster(nrows=10, ncols=15)
  r[[i]][]<-iris[,i]
}
r<-stack(r)
names(r)<-names(iris)[1:4]

# multinom (an example that works)
nn.model <- multinom(Species ~ ., data=iris, trace=F)
nn.pred<-predict(r,nn.model)

# ranger (doesn't work)
ranger.model<-ranger(Species ~ ., data=iris)   
ranger.pred<-predict(r,ranger.model)

The error given is

Error in v[cells, ] <- predv : incorrect number of subscripts on matrix

although the error with my real data is

Error in p[-naind, ] <- predv : number of items to replace is not a multiple of replacement length

The only thing that crosses my mind is that the ranger.prediction object includes several elements other than the predictions of interest. Anyway, how ranger could be used to predict on a raster stack?

Dennis Jaheruddin
  • 21,208
  • 8
  • 66
  • 122
Hugo
  • 357
  • 3
  • 10
  • I think that you can get an answer to your question if you open an issue in the [github repository of the ranger package](https://github.com/imbs-hl/ranger/issues). – lampros Oct 18 '17 at 19:12
  • ranger's `predict` is expecting data (see `?predict.ranger`) as a `data.frame` or `gwaa.data`, maybe here is the problem? – m-dz Oct 24 '17 at 16:14

4 Answers4

6

Edit, 2021-07-15

There was a question about using clusterR, and I have found a more straightforward approach that what I first suggested. The new code does the same thing as the original, but in a simpler way and with an option for parallel processing:

# First train the ranger model

ranger.model <- ranger(Species ~ .
                       , data = iris
                       , probability = TRUE  # This argument is needed for se
                       , keep.inbag = TRUE   # So is this one
                       )


# Create prediction function for clusterR

f_se <- function(model, ...) predict(model, ...)$se


# Predict se using clusterR
  
beginCluster(2)

map_se <- clusterR(r
                   , predict
                   , args = list(ranger.model
                                 , type = 'se'  # Remember to include this argument
                                 , fun = f_se
                                 )
                   )

endCluster()

Original answer, 2018-05-31

You can run predictions from a ranger model on a raster stack by training the model within the train function of the caret package:

library(caret)
ranger.model <- train(Species ~ ., data = iris, method = "ranger")  
ranger.pred <- predict(r, ranger.model)

However, this doesn't work if you want to predict the standard error, as the prediction function for train objects does not accept type = 'se'. I got around this by building a function for the purpose using this document:

https://cran.r-project.org/web/packages/raster/vignettes/functions.pdf

# Function to predict standard errors on a raster
predfun <- function(x, model, type, filename)
{
  out <- raster(x)
  bs <- blockSize(out)
  out <- writeStart(out, filename, overwrite = TRUE)
  for (i in 1:bs$n) {
    v <- getValues(x, row = bs$row[i], nrows = bs$nrows[i])
    nas <- apply(v, 1, function(x) sum(is.na(x)))
    p <- numeric(length = nrow(v))
    p[nas > 0] <- NA
    p[nas == 0] <- predict(object = model,
                           v[nas == 0,],
                           type = 'se')$se
    out <- writeValues(out, p, bs$row[i])
  }
  out <- writeStop(out)
  return(out)
}

# New ranger model 
ranger.model <- ranger(Species ~ .
                       , data = iris
                       , probability = TRUE
                       , keep.inbag  = TRUE
                       )
# Run predictions
se <- predfun(r
              , model = ranger.model
              , type  = "se"
              , filename = paste0(getwd(), "/se.tif")
              )
ABMoeller
  • 76
  • 1
  • 2
  • 1
    This worked for me! Is there a way to parallelize the predfun function or use it with clusterR to speed up processing for large rasters? – elyssac Jul 13 '21 at 13:04
  • 1
    Hi @elyssac, I added a new piece of code to the answer to show how to predict se using `clusterR` – ABMoeller Jul 15 '21 at 09:12
  • Thanks for the update. The clusterR function is producing the following error: `Error in clusterR(predictors, predict, args = list(object = ranger.model, : cluster error` with both `type = "response"` and `type = "se"`. However it works when I use the `predict` function, e.g. `pred_se_predict <- predict(predictors, ranger.model, type='se', progress='text', fun = f_se).` Should I open a reproducible example in a new post? I think adding `num.threads` to the `predict` approach works, but it still seems to run much slower than clusterR (although that may just be `ranger` vs. `randomForest`). – elyssac Jul 15 '21 at 16:02
  • Nevermind, @ABMoeller I have gotten it to work! I accidentally included `object = ranger.model`. `object` shouldn't be there. Thanks for your help! – elyssac Jul 19 '21 at 15:54
4

After a bit of fiddling:

pacman::p_load(raster, nnet, ranger)

data(iris)

# put iris data into raster
r<-list()
for(i in 1:4){
  r[[i]]<-raster(nrows=10, ncols=15)
  r[[i]][]<-iris[,i]
}
r<-stack(r)
names(r)<-names(iris)[1:4]

# multinom (an example that works)
nn.model <- multinom(Species ~ ., data=iris, trace=F)
nn.pred <- predict(r,nn.model)  # predict(object, newdata, type = c("raw","class"), ...)

# ranger (doesn't work)
ranger.model <- ranger(Species ~ ., data=iris)   
ranger.pred <- predict(ranger.model, as.data.frame(as.matrix(r)))

as.data.frame(as.matrix(r)) did it!

Disclaimer: I did not check the output for correctness, so this might produce no results at all, but...

identical(iris$Species, ranger.pred$predictions)
m-dz
  • 2,342
  • 17
  • 29
  • Thanks @m-dz, but the output (i.e. ranger.pred) is not a raster as it should be. However, actually I'm using this approach, that is, (1) convert raster to data.frame, (2) classify the entries of the df, and (3) convert to raster. However, I'm afraid that will not work for large raster? – Hugo Oct 25 '17 at 16:57
  • Unfortunately I do not know the answer here, but for sure `predict.ranger` cannot take raster as an input... Maybe it will not be that bad? – m-dz Oct 25 '17 at 17:22
2

It worked for me with randomForest instead of ranger if that helps

library(randomForest)
rf.model<-randomForest(Species ~ ., data=iris)   
rf.pred<-predict(r,rf.model)
Antonios
  • 1,919
  • 1
  • 11
  • 18
  • 1
    Thanks @Antonis, but the ranger function accepts case weights via the argument case.weights, which I'm interested in. randomForest don't. That's why I'm using ranger. – Hugo Sep 22 '17 at 12:10
1

Another solution can be found here: https://github.com/imbs-hl/ranger/issues/319

As explained there, using the raster::predict() with a ranger random forest model won't work because the raster package has no support for ranger.

A workaround to make it work is mentioned by user mnwright. You just have to add a few things to your code:

ranger.pred<-predict(r,ranger.model, fun = function(model, ...) predict(model, ...)$predictions)

Worked for me, now the object ranger.pred should be a raster.

Dharman
  • 30,962
  • 25
  • 85
  • 135
anplaceb
  • 33
  • 6