We have a spark 2.1 streaming application with a mapWithState
, enabling spark.streaming.dynamicAllocation.enabled=true
. The pipeline is as follows:
var rdd_out = ssc.textFileStream()
.map(convertToEvent(_))
.combineByKey(...., new HashPartitioner(partitions))
.mapWithState(stateSpec)
.map(s => sessionAnalysis(s))
.foreachRDD( rdd => rdd.toDF().....save(output));
The streaming app starts with 2 executors, after some time it creates new executors, as the load increases as expected. The problem is that the load is not shared by those executors.
The number of Partitions is big enough to spill over to the new executors, and the key is equally distributed, I set it up with 40+ partitions, but I can see only 8 partitions (2 executors x 4 cores each) on the mapWithState
storage. I am expecting when new executors are allocated, those 8 partitions get split and assigned to the new ones, but this never happens.
Please advise.
Thanks,