0

I have a question similar to this (link) except that my question refers to the java tool 'h2o' and its connection to 'r'.

In particular I want to assign a "h2o" object to part of a vector (or structure or array. I want to loop through and store several of them without having to manually enumerate.

I tried the solution at the link but it does not work for 'h2o' objects.

Here is my longer code (warts and all):

#libraries
library(h2o)      #for tree control

#specify data
mydata <- iris[iris$Species!="setosa",]
mydata$Species <- as.factor(as.character(mydata$Species))

#most informative variable is petal length
x1 <- mydata$Petal.Length
x2 <- mydata$Petal.Width

#build classes
C <- matrix(0,nrow=length(x1),ncol=1)
idx1 <- which(mydata$Species == "versicolor",arr.ind=T)
idx2 <- which(mydata$Species != "versicolor",arr.ind=T)
C[idx1] <- +1
C[idx2] <- 0

#start h2o
localH2O = h2o.init(nthreads = -1)

# Run regression GBM on iris.hex data
irisPath = system.file("extdata", "iris.csv", package="h2o")
iris.hex = h2o.uploadFile(localH2O, path = irisPath)
names(iris.hex) <- c("Sepal.Length",
                     "Sepal.Width",
                     "Petal.Length",
                     "Petal.Width",
                     "Species" )

iris2 <- iris
iris2$Species <- unclass(iris$Species)
iris2.hex <- as.h2o(iris2)
iris.hex$Species <- as.factor(iris2.hex$Species)

independent <- c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")
dependent <- "Species"

mare <- numeric()
mae <- matrix(1,nrow=10,ncol=1)

est2.h2o <- vector(mode="list", length=150)

for (i in 1:150){

     est2.h2o[[i]] <- h2o.gbm(y = dependent, 
                         x = independent, 
                         training_frame = iris.hex,
                         distribution="AUTO",
                         ntrees = i, max_depth = 3, min_rows = 2,
                         learn_rate = 0.5)


     pred <- h2o.predict(est2.h2o,newdata=iris.hex)

     err <- iris2$Species-(as.data.frame(pred)$predict+1)

     mae[i] <- mean(abs(err))
     mare[i] <- mean(abs(err)/iris2$Species)

     print(c(i,log10(mae[i])))

}

The error that I get is:

Error in paste0("Predictions/models/", object@model_id, "/frames/", newdata@frame_id) : 
  trying to get slot "model_id" from an object of a basic class ("list") with no slots

My intention is to have a list/structure/array of GBM's that I can then run predict against for the whole data-set, and cull the less informative ones. I'm trying to make a decent "random forest of gbt's" following the steps of Eugene Tuv. I don't have his code.

Questions:
Is there a proper way to pack the h2o gbm along with a few (hundred) of its buddies, into a single store in r?

If the referenced object is thrown away in java, making this sort of approach unfeasible, is there a feasible variation using the 'gbm' library? If I end up having to use gbm, what is the speed difference vs. h2o?

Community
  • 1
  • 1
EngrStudent
  • 1,924
  • 31
  • 46
  • Can you give me an example of the parameters you're using in h2o.gbm? – Shape Nov 25 '15 at 22:16
  • 1
    doesn't run just yet, I think I need an API key. But if you `lapply` on 1:150, you'll get your list, then I believe `sapply(est.h2o, h2o.predict, newdata = iris.hex)` should generate a data.frame, and then you can do the remainder as vector calculations. The error you're getting is from using the wrong type object – Shape Nov 26 '15 at 01:39
  • nah. R and h2o work together but need rjava with the 32 bit version. I have a 64 bit cpu, but the r itself is 32 bit, and there is some bitwise compatibility issue. I will see if I can wrap it in a "sapply". – EngrStudent Nov 26 '15 at 02:08
  • Ah okay, that makes sense, I'll try it out, h2o looks neat – Shape Nov 26 '15 at 02:11
  • There are 2 reasons I like h2o. 1) it will use all my cores and most of my memory while r doesn't - which makes it substantially faster and 2) other tools wrapper it like domino, amazon. (It also works with java, scala, r and python, but it had me at 'r'.) – EngrStudent Nov 26 '15 at 02:15

1 Answers1

1

Without seeing the exact parameters you're using, My guess is that the problem is that you're using sapply and not lapply.

sapply often attempts to simplify the result, which is good most of the time. But, if you want something that can contain any kind of object, then you want a list.

if we define paramListList as a list, where each entry is a list containing your parameters for h2o.gbm:

Ex:

paramListList <- list(list(x = xVALUES1, 
                           y = yVALUES1, 
                           training_frame = tfVALUES1, 
                           model_id = miVALUES1, 
                           checkpoint = checkVALUES1),
                      list(x = xVALUES2, 
                           y = yVALUES2, 
                           training_frame = tfVALUES2, 
                           model_id = miVALUES2, 
                           checkpoint = checkVALUES2),
                     )

then you can do the following:

lapply(paramListList, function(paramlist) do.call(h2o.gbm, paramlist))

which will put all of your results in that one list

Shape
  • 2,892
  • 19
  • 31