1

This seems like and obvious question but I couldn't find anything so far. I want to train a random forest but my data set is very big. It has only a few features but about 3 million rows.

If I train with a smaller sample everything works nicely but if I use the whole data set my system runs out of memory (16GB) and freezes. Is there a way to train an algorithm using batches in caret. Something like the partial fit in sklearn.

creyesk
  • 343
  • 3
  • 9
  • As far as I know this is not possible directly from caret. But you can manage it by combining [this answer](https://stackoverflow.com/questions/22261082/load-a-small-random-sample-from-a-large-csv-file-into-r-data-frame) or [this one](https://stackoverflow.com/questions/15532810/reading-40-gb-csv-file-into-r-using-bigmemory/18282037#18282037) with [this one](https://stackoverflow.com/questions/48590157/r-caret-estimate-parameters-on-a-subset-fit-to-full-data/48643166#48643166). The implementation would depend on the details which you question lacks. – missuse Feb 06 '18 at 12:38

0 Answers0