That's a really great question, to which I won't be able to give a fully detailed answer (I'll be following this question to see if better answers pop up) but I've been snooping around on the docs and found out some things.
I wasn't sure whether I should post this as an answer, because it ends with a few questions of my own but since it does answer your question to some degree I decided to post this as an answer. If this is not appropriate I'm happy to move this somewhere else.
Spark configuration docs
From the configuration docs, you can see the following about spark.memory.fraction
:
Fraction of (heap space - 300MB) used for execution and storage. The lower this is, the more frequently spills and cached data eviction occur. The purpose of this config is to set aside memory for internal metadata, user data structures, and imprecise size estimation in the case of sparse, unusually large records. Leaving this at the default value is recommended. For more detail, including important information about correctly tuning JVM garbage collection when increasing this value, see this description.
So we learn it contains:
- Internal metadata
- User data structures
- Imprecise size estimation in case of sparse, unusually large records
Spark tuning docs: memory management
Following the link in the docs, we get to the Spark tuning page. In there, we find a bunch of interesting info about the storage vs execution memory, but that is not what we're after in this question. There is another bit of text:
spark.memory.fraction expresses the size of M as a fraction of the (JVM heap space - 300MiB) (default 0.6). The rest of the space (40%) is reserved for user data structures, internal metadata in Spark, and safeguarding against OOM errors in the case of sparse and unusually large records.
and also
The value of spark.memory.fraction should be set in order to fit this amount of heap space comfortably within the JVM’s old or “tenured” generation. See the discussion of advanced GC tuning below for details.
So, this is a similar explanation and also a reference to garbage collection.
Spark tuning docs: garbage collection
When we go to the garbage collection page, we see a bunch of information about classical GC in Java. But there is a section that discusses spark.memory.fraction
:
In the GC stats that are printed, if the OldGen is close to being full, reduce the amount of memory used for caching by lowering spark.memory.fraction; it is better to cache fewer objects than to slow down task execution. Alternatively, consider decreasing the size of the Young generation. This means lowering -Xmn if you’ve set it as above. If not, try changing the value of the JVM’s NewRatio parameter. Many JVMs default this to 2, meaning that the Old generation occupies 2/3 of the heap. It should be large enough such that this fraction exceeds spark.memory.fraction.
What do I gather from this
As you have already said, the default spark.memory.fraction
is 0.6, so 40% is reserved for this "user memory". That is quite large. Which objects end up in there?
This is where I'm not sure, but I would guess the following:
- Internal metadata
- I don't expect this to be huge?
- User data structures
- This might be large (just intuition speaking here, not sure at all), and I would hope that someone with more knowledge about this would be able to give some good examples here.
- If you make intermediate structures during a map operation on a dataset, do they end up in user memory or in execution memory?
- Imprecise size estimation in the case of sparse, unusually large records
- Seems like this is only triggered in special cases, would be interesting to know where/how this gets decided.
- In some other place in the docs it is said safeguarding against OOM errors in the case of sparse and unusually large records. So it might be that this is more of a safety buffer than anything else?