3

I've been playing around with the doRedis package in R to try run some code on a cluster. I've got one Windows machine and one machine running Ubuntu (which is where redis is installed).

I can happily run the example from the doRedis documentation but my goal is to be able to use doRedis in tandem with caret for some machine learning applications. It's my understanding that caret allows for parallelisation and it seems that others have gotten this to work but for the life of me I can't figure out where I'm going wrong.

I found this example and modified it slightly to the following:

library(caret)
library(doRedis)

dat = iris

registerDoRedis("jobs",
                host = "xyz")

xgb.grid = expand.grid(nrounds = c(10, 200),
                       max_depth = c(6),
                       eta = c(0.05),
                       gamma = c(0.01),
                       colsample_bytree = 1,
                       min_child_weight = 1,
                       subsample = 1)

ctrl = trainControl(method = 'cv',
                    number = 10,
                    verboseIter = F,
                    allowParallel = T)

set.seed(13)
xgb1 <- train(Species ~ .,
              data = dat,
              method = "xgbTree",
              trControl = ctrl,
              verbose = F,
              tuneGrid = xgb.grid)

removeQueue("jobs")

This only runs on the local machine, and isn't distributed to the redis queue (and I can see this using doRedis::jobs(), as well as by running redis-cli --stat in the Ubuntu terminal, both of which show no jobs being passed to the server).

What am I missing?

Flobagob
  • 76
  • 1
  • 12

1 Answers1

1

Please check out https://topepo.github.io/caret/parallel-processing.html

Relevant quote:

train, rfe, sbf, bag and avNNet were given an additional argument in their respective control files called allowParallel that defaults to TRUE. When TRUE, the code will be executed in parallel if a parallel backend (e.g. doMC) is registered.

One suggestion to help you debug this is to first try to use redis locally, if that works specify the other server.

Rick
  • 2,080
  • 14
  • 27
  • Thanks for the suggestion. I've set `allowParallel` to TRUE with no success. I don't think I understand why using redis locally would be of any help - wouldn't it just be the same as running it in parallel on my CPU cores? – Flobagob Aug 05 '20 at 06:56
  • Yes it would, but you would see if the problem is e.g. host resolution. If it works locally, you know that the problem is not with your code, but with the setup. – Rick Aug 05 '20 at 08:16
  • I don't think it's anything wrong with the setup. I can run a workaround for the above problem using `foreach` and that works fine, I'd just prefer to use `caret`'s built-in parallelization. – Flobagob Aug 05 '20 at 13:44
  • Yupp understand what you mean. Going by the doc you seem to do everything correctly. Please also check out https://stackoverflow.com/questions/44774516/parallel-processing-in-r-in-caret. There the allowParallel is passed to the `train` method. – Rick Aug 05 '20 at 14:56