I have to compute and to keep in memory several (e.g. 20 or more) random forests model with Apache Spark.
I have only 8 GB available on the driver of the yarn cluster I use to launch the job. And I am faced to OutOfMemory
errors because models do not fit in memory. I have already decreased the ratio spark.storage.memoryFraction
to 0.1 to try to increase the non-RDD memory.
I have thus two questions:
- How could I make these models fit in memory?
- What could I check the size of my models?
EDIT
I have 200 executors which have 8GB of space.
I am not sure my models live in the driver but I suspect it as I get OutOfMemory
errors and I have plenty of space in the executors. Furthermore, I stock these models in Arrays