1

I have an Azure Databricks Spark cluster consisting of 6 nodes (5 workers + 1 driver) of 16 cores & 64GB memory each.

I'm running a PySpark notebook that:

  1. reads a DF from parquet files.
  2. caches it (df.cache()).
  3. executes an action on it (df.toPandas()).

From the SparkUI-Storage I see the cached DF takes up 9.6GB in memory, divided into 28 files, taking up 3GB+ on-heap memory of 3 workers:

enter image description here

enter image description here

At this point, I see from the mem_report on Ganglia, that the 3 workers' on-heap memory is being used (i.e. the 40g -- see spark configs below).

Next, I clear the DF from cache (df.unpersist(True)), and after doing that, I correctly see the storage object gone, and the workers' storage memory (almost) emptied:

enter image description here

but my workers' executor memory is never released (not even after I detach my notebook from the cluster):

enter image description here

My question is, how can I get the workers to clear their executor memory? Is it a GC problem (setting the G1GC didnt help either -- see comments below)?

Thanks!


These are my -relevant- Spark config settings:

spark.executor.memory 40g
spark.memory.storageFraction .6
spark.databricks.io.cache.enabled true
spark.cleaner.periodicGC.interval 2m
spark.sql.execution.arrow.enabled true
spark.storage.cleanupFilesAterExecutorExit true
spark.worker.cleanup.enabled true

Setting the G1GC as follows, did not have an impact on the mem:

spark.executor.extraJavaOptions -XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:+InitiatingHeapOccupancyPercent=25 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps

For the purposes of my experiment, there's nothing running in the cluster before nor after the job execution.

lqrz
  • 93
  • 9

0 Answers0