1

I have to compute and to keep in memory several (e.g. 20 or more) random forests model with Apache Spark.

I have only 8 GB available on the driver of the yarn cluster I use to launch the job. And I am faced to OutOfMemory errors because models do not fit in memory. I have already decreased the ratio spark.storage.memoryFraction to 0.1 to try to increase the non-RDD memory.

I have thus two questions:

  • How could I make these models fit in memory?
  • What could I check the size of my models?

EDIT

I have 200 executors which have 8GB of space.

I am not sure my models live in the driver but I suspect it as I get OutOfMemory errors and I have plenty of space in the executors. Furthermore, I stock these models in Arrays

zero323
  • 322,348
  • 103
  • 959
  • 935
Pop
  • 12,135
  • 5
  • 55
  • 68
  • This question needs more details. What is the size of your executors? How many executors do you have? Why do your models live in the driver? – marios Feb 02 '16 at 16:57

0 Answers0