I have an application in scala on Spark-graphx. The VD contains a Map[Long, Map[Long, Double]] which needs to grow with each iteration. Both are created from List.toMap, so AFAIK both inner and outer should be immutable. What I have run into on very large graph data sets is an understanding of why in the documentation for the Pregel AI it says that ideally the VD should not grow - I am getting the dreaded "Missing an output location for shuffle n partition m", i.e., OOM.
So my question is this - how are immutable maps stored internally in scala? If I had an idea of the memory usage of a map, then I could initialize each VD with some number of placeholder bytes that each vertex could "exchange" for map size, so that the overall size does not grow (significantly). This is not the most elegant solution, but I cannot think of another for this particular problem.
Alternatively, if someone could suggest a better way to handle this accumulation of data in the VD, then I am also open to such suggestions.