1

I'm working on spark in local mode with the following options

spark-shell --driver-memory 21G --executor-memory 10G --num-executors 4 --driver-java-options "-Dspark.executor.memory=10G"  --executor-cores 8

It is a four node cluster of 32G RAM each.

I computed column similarities using DIMSUM and trying write to file

It computed column similarities for 6.7million items and when persisting to file it is leading to thread spilling issues.

dimSumOutput.coalesce(1, true).saveAsTextFile("/user/similarity")

dimSumOutput is an RDD which contains column similarity in the format (row, col, sim)

16/03/20 21:41:22 INFO spark.ContextCleaner: Cleaned shuffle 2
16/03/20 21:41:25 INFO collection.ExternalSorter: Thread 184 spilling in-    memory map of 479.5 MB to disk (1 time so far)
16/03/20 21:41:26 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (1 time so far)
16/03/20 21:41:26 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 479.5 MB to disk (1 time so far)
16/03/20 21:41:28 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (1 time so far)
16/03/20 21:41:31 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 535.0 MB to disk (1 time so far)
16/03/20 21:41:32 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 609.3 MB to disk (1 time so far)
16/03/20 21:42:07 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 481.3 MB to disk (2 times so far)
16/03/20 21:42:14 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (2 times so far)
16/03/20 21:42:18 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (2 times so far)
16/03/20 21:42:21 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 491.5 MB to disk (2 times so far)
16/03/20 21:42:27 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 542.7 MB to disk (2 times so far)
16/03/20 21:42:32 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 583.7 MB to disk (2 times so far)
16/03/20 21:43:25 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 479.5 MB to disk (3 times so far)
16/03/20 21:43:33 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (3 times so far)
16/03/20 21:43:45 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 483.8 MB to disk (3 times so far)
16/03/20 21:43:50 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (3 times so far)
16/03/20 21:43:56 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 535.0 MB to disk (3 times so far)
16/03/20 21:44:01 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 624.6 MB to disk (3 times so far)
16/03/20 21:44:14 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 482.6 MB to disk (4 times so far)
16/03/20 21:44:20 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (4 times so far)
16/03/20 21:44:37 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 479.5 MB to disk (4 times so far)
16/03/20 21:45:09 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (4 times so far)
16/03/20 21:45:22 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 581.1 MB to disk (4 times so far)
16/03/20 21:45:23 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 539.5 MB to disk (4 times so far)
16/03/20 21:45:28 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 479.5 MB to disk (5 times so far)
16/03/20 21:45:40 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 486.4 MB to disk (5 times so far)
16/03/20 21:45:52 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (5 times so far)
16/03/20 21:45:59 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 479.5 MB to disk (5 times so far)
16/03/20 21:46:14 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 479.5 MB to disk (6 times so far)
16/03/20 21:46:24 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 539.6 MB to disk (5 times so far)
16/03/20 21:46:25 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 527.4 MB to disk (5 times so far)
16/03/20 21:47:11 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 576.0 MB to disk (6 times so far)
16/03/20 21:47:19 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 491.5 MB to disk (6 times so far)
16/03/20 21:47:20 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (6 times so far)
16/03/20 21:47:43 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 686.1 MB to disk (7 times so far)
16/03/20 21:47:50 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 539.5 MB to disk (6 times so far)
16/03/20 21:47:57 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 599.0 MB to disk (6 times so far)
16/03/20 21:48:04 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 481.3 MB to disk (7 times so far)
16/03/20 21:48:39 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 479.5 MB to disk (7 times so far)
16/03/20 21:48:40 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (7 times so far)
16/03/20 21:49:06 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 479.5 MB to disk (8 times so far)
16/03/20 21:49:21 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 519.5 MB to disk (7 times so far)
16/03/20 21:49:21 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 489.0 MB to disk (8 times so far)
16/03/20 21:49:28 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 540.2 MB to disk (7 times so far)
16/03/20 21:49:36 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 485.1 MB to disk (8 times so far)
16/03/20 21:49:39 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 601.6 MB to disk (8 times so far)
16/03/20 21:50:04 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 576.0 MB to disk (9 times so far)
16/03/20 21:50:20 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 519.7 MB to disk (8 times so far)
16/03/20 21:50:24 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (9 times so far)
16/03/20 21:50:27 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 539.5 MB to disk (8 times so far)
16/03/20 21:50:28 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (9 times so far)
16/03/20 21:51:03 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 489.0 MB to disk (9 times so far)
16/03/20 21:51:22 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 479.5 MB to disk (10 times so far)
16/03/20 21:51:41 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 519.5 MB to disk (9 times so far)
16/03/20 21:51:45 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 483.8 MB to disk (10 times so far)
16/03/20 21:51:45 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (10 times so far)
16/03/20 21:51:51 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 550.4 MB to disk (9 times so far)
16/03/20 21:52:04 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 479.5 MB to disk (10 times so far)
16/03/20 21:52:20 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 509.4 MB to disk (11 times so far)
16/03/20 21:52:40 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (11 times so far)

Any pointers on how to fix it?

tourist
  • 4,165
  • 6
  • 25
  • 47

1 Answers1

1

1) It is weird that you're using --executor-memory 65G (bigger than your 32GB!) and then on the same command line --driver-java-options "-Dspark.executor.memory=10G". Is it a typo? If not, are you sure about the effects of such call? Please provide more information.

2) More to your point, after data is processed by your 4 workers you're asking Spark to coalesce data to a single partition (thus on a single executor). Depending on the executor assigned memory (see 1), this probably means a single executor to handle a number of records that is too big. Here I would try to first ensure what is the amount of memory assigned to the executors (see the Spark UI and Yarn UI if you use it for instance). Then I would really consider the need to coalesce to 1. Also as @Yaron suggested you might take a look at the shuffle related settings of your application, and change the spark.shuffle.memoryFraction (keep in mind the maximum of 0.8 when summing with spark.storage.memoryFraction), just keep in mind that newer versions of Spark consider such settings deprecated.

mauriciojost
  • 370
  • 1
  • 11
  • note that in latest Spark versions (1.6.X) the memoryFraction settings are deprecated - if you set them they will be ignored unless you revert to old memory settings by switching `spark.memory.useLegacyMode` to `true`. See http://spark.apache.org/docs/1.6.1/configuration.html#memory-management – Tzach Zohar Mar 21 '16 at 08:20
  • @mauriciojost that was a typo, thanks for pointing it out, updated the question. I tried to change the parameters of spark.shuffle.memoryFraction to 0.7 and spark.storage.memoryFraction to 0.1 which sums up to 0.8 , but still facing the same issue of thread spilling into disk.I'm trying this in local mode, could it be a cause for it? or there is issue with the configuration I'm passing to spark-shell – tourist Mar 22 '16 at 06:30
  • @tourist Data that needs to br coalesced will be sent to a single executor, hence it needs to fit in its memory. There are many variables that you can modify: amount of records (but I think it is fixed in your example), size of the record in memory (use Kryo to compress them, remember to force register your classes on it to make sure you get the most out of the compression), increase the memory on the executors, or the best one: write in parallel, avoiding coalesce (1). Let me know if it helps to update only once my answer. – mauriciojost Mar 22 '16 at 07:25
  • @mauriciojost I removed the coalesce step , thought of merging the task is done.I'm using the following properties. spark-shell --driver-memory 21G --executor-memory 32G --conf "spark.rdd.compress=true" --conf "spark.shuffle.memoryFraction=1" --conf "spark.kryoserializer.buffer.max=256m" --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" --num-executors 4 --executor-cores 16 let me know if I can tweak some more – tourist Mar 22 '16 at 07:29
  • Cannot remember exactly the previous output, but see that now the 4 executors are spilling, so it kind of confirms the hypothesis of the shuffle-during-coalesce, are you sure you have a so good reason to keep it and sacrifice performance? – mauriciojost Mar 22 '16 at 07:30
  • no it didn't work :( It leading to OOM exception can we tweak any parameters? – tourist Mar 22 '16 at 07:32
  • So you removed a cause for shuffling but you still get a shuffle somewhere. Then the shuffling occurs somewhere above in your algorithm? Could you describe the DAG that Spark has resolved for your action saveAsTextFile("/user/similarity")? – mauriciojost Mar 22 '16 at 07:35
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/106981/discussion-between-tourist-and-mauriciojost). – tourist Mar 22 '16 at 07:37