5

has anyone been able to set up a classification (not a regressions) using randomForest AND the bigmemory library. I am aware that the 'formula approach" cannot be used and we have to resort to the "x=predictors, y=response approach". It appears that the big memory library is unable to deal with a response vector that has categorical values (its a matrix, after all). In my case, I have two levels, both represented as characters.

According to the bigmemory documentation..."A data frame will have character vectors converted to factors, and then all factors converted to numeric factor levels"

Any suggested workarounds to get randomForest classification to work with bigmemory?

#EXAMPLE to problem
library(randomForest)
library(bigmemory)
# Removing any extra objects from my workspace (just in case)
rm(list=ls())

#first small matrix
small.mat <- matrix(sample(0:1,5000,replace = TRUE),1000,5)
colnames(small.mat) <- paste("V",1:5,sep = "")
small.mat[,5] <- as.factor(small.mat[,5]) 
small.rf <- randomForest(V5 ~ .,data = small.mat, mtry=2, do.trace=100)
print(small.rf)
small.result <- matrix(0,1000,1)
small.result <- predict(small.rf, data=small.mat[,-5])

#now small dataframe Works!
small.mat <- matrix(sample(0:1,5000,replace = TRUE),1000,5)
colnames(small.mat) <- paste("V",1:5,sep = "")
small.data <- as.data.frame(small.mat)

small.data[,5] <- as.factor(small.data[,5]) 
small.rf <- randomForest(V5 ~ .,data = small.data, mtry=2, do.trace=100)
print(small.rf)
small.result <- matrix(0,1000,1)
small.result <- predict(small.rf, data=small.data[,-5])


#then big matrix Classification Does NOT Work :-(
#----------------****************************----
big.mat <- as.big.matrix(small.mat, type = "integer")
#Line below throws error, "cannot coerce class 'structure("big.matrix", package = "bigmemory")' into a data.frame"
big.rf <- randomForest(V5~.,data = big.mat, do.trace=10)

#Runs without error but only regression
big.rf <- randomForest(x = big.mat[,-5], y = big.mat[,5], mtry=2, do.trace=100)
print(big.rf)
big.result <- matrix(0,1000,1)
big.result <- predict(big.rf, data=big.mat[,-5])
cdeterman
  • 19,630
  • 7
  • 76
  • 100
auro
  • 1,079
  • 1
  • 10
  • 22
  • 1
    Coerce to factor via `y = as.factor(big.mat[,5])`? – joran Apr 29 '12 at 05:58
  • I should add that I have no idea if `randomForest` actually supports big.matrix input when the object is truly too large for memory. – joran Apr 29 '12 at 06:02
  • As I know, `randomForest` loads all `bigmemory` data in RAM when one calls model building. – DrDom Apr 29 '12 at 06:25
  • @joran, thanks! Coerce elicits the error below.
    Error in as.data.frame.default(data) : cannot coerce class 'structure("big.matrix", package = "bigmemory")' into a data.frame > > big.response.vec <- as.factor(big.mat[,5])
    – auro Apr 29 '12 at 16:25
  • Runs fine for me. But I'm still pretty sure that using bigmemory with randomForest isn't going to accomplish anything: the values will have to all be loaded into memory to be handed off to the C code. – joran Apr 29 '12 at 16:31

1 Answers1

1

bigrf package may help. Currently, it supports classification with a limited number of features.

Nhan Vu
  • 11
  • 1