I'm trying to run a cforest
(party
package) with a dataset of ~70k observations and ~105 variables, one of them is the response variable (binary).
The specific information for the cforest is mtry = 10, ntree = 50, maxsurrogate = 3
.
The problem is that it takes too long for building the cforest
(2 hours and 50 minutes), when ranger
for example only takes 6 minutes for 500 trees and mtry=10
. I know the methodology behind each process is quite different, but is this computational cost normal? Am I doing something wrong with the tuning parameters?
After building the cforest I try to evaluate it with the function predict()
in a dataset with ~30k observations and I receive the following error message:
Reached total allocation of 8067Mb: see help(memory.size)
I'm working on a desktop computer with Windows 7, the technical features are:
- Processor: Intel core i5-5300U CPU @ 2.30GHz 2.30GHz
- Installed memory (RAM): 8,00 GB (7,88 GB usable)
- System type: 64-bit Operating System
Thank you very much for your time.