I launched a spark job with these settings (among others):
spark.driver.maxResultSize 11GB
spark.driver.memory 12GB
I was debugging my pyspark
job, and it kept giving me the error:
serialized results of 16 tasks (17.4 GB) is bigger than spark.driver.maxResultSize (11 GB)
So, I increased the spark.driver.maxResultSize
to 18 G
in the configuration settings. And, it worked!!
Now, this is interesting because in both cases the spark.driver.memory
was SMALLER than the serialized results returned.
Why is this allowed? I would assume this not to be possible because the serialized results were 17.4 GB
when I was debugging, which is more than the size of the driver, which is 12 GB
, as shown above?
How is this possible?