I have a very large number of keys and limited cluster size.
I am using mapWithState
to update my states. As new data comes in the number of keys increases. When I went to the storage tab of the spark UI MapWithStateRDD
is always stored in memory.
In line 109 of the source code MapWithStateDstream.Scala the method persist is called when storage level is set to MEMORY_ONLY. Does this mean my application will crash if I have too many keys?