I'm using MLLib to train a random forest. It's working fine to depth 15, but if I use depth 20 I get
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
on the driver, from the collectAsMap operation in DecisionTree.scala, around line 642. It doesn't happen until a good hour into training. I'm using 50 trees on 36 slaves with maxMemoryInMB=250, but still get an error even if I use a driver memory of 240G.
Has anybody seen this error in this context before, and can advise on what might be triggering it?
Best, Luke