3

When I am running a spark application on yarn, with driver and executor memory settings as --driver-memory 4G --executor-memory 2G

Then when I run the application, an exceptions throws complaining that Container killed by YARN for exceeding memory limits. 2.5 GB of 2.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.

What does this 2.5 GB mean here? (overhead memory, executor memory or overhead+executor memory?)I ask so because when I change the the memory settings as:

--driver-memory 4G --executor-memory 4G --conf --driver-memory 4G --conf spark.yarn.executor.memoryOverhead=2048,then the exception disappears.

I would ask, although I have boosted the overhead memory to 2G, it is still under 2.5G, why does it work now?

Tom
  • 5,848
  • 12
  • 44
  • 104
  • Possible duplicate of https://stackoverflow.com/questions/49988475/why-increase-spark-yarn-executor-memoryoverhead – Amit Kumar Jul 09 '18 at 02:30
  • @AmitKumar it is not a duplicate – Tom Jul 09 '18 at 03:38
  • But it has the explanation of the question you have asked here. You can go inside the reference Spark link inside that. I think that explains pretty much you asked for. – Amit Kumar Jul 09 '18 at 04:14

1 Answers1

10

Let us understand how memory is divided among various regions in spark.

  1. Executor MemoryOverhead :

spark.yarn.executor.memoryOverhead = max(384 MB, .07 * spark.executor.memory). In your first case, memoryOverhead = max(384 MB, 0.07 * 2 GB) = max(384 MB, 143.36 MB) Hence, memoryOverhead = 384 MB is reserved in each executer assuming you have assigned single core per executer.

  1. Execution and Storage Memory :

By default spark.memory.fraction = 0.6, which implies that execution and storage as a unified region occupy 60% of the remaining memory i.e. 998 MB. There is no strict boundary that is allocated to each region unless you enable spark.memory.useLegacyMode. Otherwise they share a moving boundary.

  1. User Memory :

Memory pool that remains after the allocation of Execution and Storage Memory, and it is completely up to you to use it in a way you like. You can store your own data structures there that would be used in RDD transformations. For example, you can rewrite Spark aggregation by using mapPartitions transformation maintaining hash table for this aggregation to run. This comprises the rest of 40% memory left after MemoryOverhead. In your case it is ~660 MB.

If any of the above allocations are not met by your job, then it is highly likely to end up in OOM problems.

pushpavanthar
  • 819
  • 6
  • 20