0

I'm trying to run a cforest (party package) with a dataset of ~70k observations and ~105 variables, one of them is the response variable (binary).

The specific information for the cforest is mtry = 10, ntree = 50, maxsurrogate = 3.

The problem is that it takes too long for building the cforest (2 hours and 50 minutes), when ranger for example only takes 6 minutes for 500 trees and mtry=10. I know the methodology behind each process is quite different, but is this computational cost normal? Am I doing something wrong with the tuning parameters?

After building the cforest I try to evaluate it with the function predict() in a dataset with ~30k observations and I receive the following error message:

Reached total allocation of 8067Mb: see help(memory.size)

I'm working on a desktop computer with Windows 7, the technical features are:

  • Processor: Intel core i5-5300U CPU @ 2.30GHz 2.30GHz
  • Installed memory (RAM): 8,00 GB (7,88 GB usable)
  • System type: 64-bit Operating System

Thank you very much for your time.

1 Answers1

1

Using party you could build the trees separately and later combine them, but this is tedious. The partykit devel version from R-forge offers a reimplementation of ctree/cforest which aims at better memory efficiency.

Torsten

  • Thank you very much for your response Mr. Hothorn, I'm having some troubles running the ´cforest´ with ´partykit´, I got a weird error message ´Error in if (cov < .Machine$double.eps) return(c(-Inf, -Inf)) : missing value where TRUE/FALSE needed´ and I guess that it is because some values of some of my predictors are very close to zero... It didn't happened with the ´party´ package, but I will fix that and then post the results back here. – D. Morales May 17 '17 at 14:18