17

I am trying to join two large spark dataframes and keep running into this error:

Container killed by YARN for exceeding memory limits. 24 GB of 22 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.

This seems like a common issue among spark users, but I can't seem to find any solid descriptions of what spark.yarn.executor.memoryOverheard is. In some cases it sounds like it's a kind of memory buffer before YARN kills the container (e.g. 10GB was requested, but YARN won't kill the container until it uses 10.2GB). In other cases it sounds like it's being used to to do some kind of data accounting tasks that are completely separate from the analysis that I want to perform. My questions are:

  • What is the spark.yarn.executor.memoryOverhead being using for?
  • What is the benefit of increasing this kind of memory instead of executor memory (or the number of executors)?
  • In general, are there things steps I can take to reduce my spark.yarn.executor.memoryOverhead usage (e.g. particular datastructures, limiting the width of the dataframes, using fewer executors with more memory, etc)?
Fortunato
  • 567
  • 6
  • 18

1 Answers1

3

Overhead options are nicely explained in the configuration document:

This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%).

This also includes user objects if you use one of the non-JVM guest languages (Python, R, etc...).

Alper t. Turker
  • 34,230
  • 9
  • 83
  • 115
  • can you explain what interned strings are? Does it mean, when i have a lot of strings in my data i get out of memory errors, because i have too many interned strings off heap? – Joha May 22 '18 at 12:04
  • 2
    @Joha String interning is the process, in which you store only a single copy of an unique string, and reference it whenever the same value is used anywhere in the scope (using some form of lookup table) Different languages choose different approach (for example Python interns only short string, R as far as I am aware all, Java if I am not mistaken, interns `String` constants by default. – Alper t. Turker May 22 '18 at 12:31
  • descriptive answer here https://stackoverflow.com/a/51238429/3213772 – pushpavanthar Jan 07 '20 at 11:27