I have a large dataset in R (1M+ rows by 6 columns) that I want to use to train a random forest (using the randomForest
package) for regression purposes. Unfortunately, I get a Error in matrix(0, n, n) : too many elements specified
error when trying to do the whole thing at once and cannot allocate enough memory kind of errors when running in on a subset of the data -- down to 10,000 or so observations.
Seeing that there is no chance I can add more RAM on my machine and random forests are very suitable for the type of process I am trying to model, I'd really like to make this work.
Any suggestions or workaround ideas are much appreciated.