I'm working on spark in local mode with the following options
spark-shell --driver-memory 21G --executor-memory 10G --num-executors 4 --driver-java-options "-Dspark.executor.memory=10G" --executor-cores 8
It is a four node cluster of 32G RAM each.
I computed column similarities using DIMSUM and trying write to file
It computed column similarities for 6.7million items and when persisting to file it is leading to thread spilling issues.
dimSumOutput.coalesce(1, true).saveAsTextFile("/user/similarity")
dimSumOutput is an RDD which contains column similarity in the format (row, col, sim)
16/03/20 21:41:22 INFO spark.ContextCleaner: Cleaned shuffle 2
16/03/20 21:41:25 INFO collection.ExternalSorter: Thread 184 spilling in- memory map of 479.5 MB to disk (1 time so far)
16/03/20 21:41:26 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (1 time so far)
16/03/20 21:41:26 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 479.5 MB to disk (1 time so far)
16/03/20 21:41:28 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (1 time so far)
16/03/20 21:41:31 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 535.0 MB to disk (1 time so far)
16/03/20 21:41:32 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 609.3 MB to disk (1 time so far)
16/03/20 21:42:07 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 481.3 MB to disk (2 times so far)
16/03/20 21:42:14 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (2 times so far)
16/03/20 21:42:18 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (2 times so far)
16/03/20 21:42:21 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 491.5 MB to disk (2 times so far)
16/03/20 21:42:27 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 542.7 MB to disk (2 times so far)
16/03/20 21:42:32 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 583.7 MB to disk (2 times so far)
16/03/20 21:43:25 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 479.5 MB to disk (3 times so far)
16/03/20 21:43:33 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (3 times so far)
16/03/20 21:43:45 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 483.8 MB to disk (3 times so far)
16/03/20 21:43:50 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (3 times so far)
16/03/20 21:43:56 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 535.0 MB to disk (3 times so far)
16/03/20 21:44:01 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 624.6 MB to disk (3 times so far)
16/03/20 21:44:14 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 482.6 MB to disk (4 times so far)
16/03/20 21:44:20 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (4 times so far)
16/03/20 21:44:37 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 479.5 MB to disk (4 times so far)
16/03/20 21:45:09 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (4 times so far)
16/03/20 21:45:22 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 581.1 MB to disk (4 times so far)
16/03/20 21:45:23 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 539.5 MB to disk (4 times so far)
16/03/20 21:45:28 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 479.5 MB to disk (5 times so far)
16/03/20 21:45:40 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 486.4 MB to disk (5 times so far)
16/03/20 21:45:52 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (5 times so far)
16/03/20 21:45:59 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 479.5 MB to disk (5 times so far)
16/03/20 21:46:14 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 479.5 MB to disk (6 times so far)
16/03/20 21:46:24 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 539.6 MB to disk (5 times so far)
16/03/20 21:46:25 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 527.4 MB to disk (5 times so far)
16/03/20 21:47:11 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 576.0 MB to disk (6 times so far)
16/03/20 21:47:19 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 491.5 MB to disk (6 times so far)
16/03/20 21:47:20 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (6 times so far)
16/03/20 21:47:43 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 686.1 MB to disk (7 times so far)
16/03/20 21:47:50 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 539.5 MB to disk (6 times so far)
16/03/20 21:47:57 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 599.0 MB to disk (6 times so far)
16/03/20 21:48:04 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 481.3 MB to disk (7 times so far)
16/03/20 21:48:39 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 479.5 MB to disk (7 times so far)
16/03/20 21:48:40 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (7 times so far)
16/03/20 21:49:06 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 479.5 MB to disk (8 times so far)
16/03/20 21:49:21 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 519.5 MB to disk (7 times so far)
16/03/20 21:49:21 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 489.0 MB to disk (8 times so far)
16/03/20 21:49:28 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 540.2 MB to disk (7 times so far)
16/03/20 21:49:36 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 485.1 MB to disk (8 times so far)
16/03/20 21:49:39 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 601.6 MB to disk (8 times so far)
16/03/20 21:50:04 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 576.0 MB to disk (9 times so far)
16/03/20 21:50:20 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 519.7 MB to disk (8 times so far)
16/03/20 21:50:24 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (9 times so far)
16/03/20 21:50:27 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 539.5 MB to disk (8 times so far)
16/03/20 21:50:28 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (9 times so far)
16/03/20 21:51:03 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 489.0 MB to disk (9 times so far)
16/03/20 21:51:22 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 479.5 MB to disk (10 times so far)
16/03/20 21:51:41 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 519.5 MB to disk (9 times so far)
16/03/20 21:51:45 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 483.8 MB to disk (10 times so far)
16/03/20 21:51:45 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (10 times so far)
16/03/20 21:51:51 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 550.4 MB to disk (9 times so far)
16/03/20 21:52:04 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 479.5 MB to disk (10 times so far)
16/03/20 21:52:20 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 509.4 MB to disk (11 times so far)
16/03/20 21:52:40 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (11 times so far)
Any pointers on how to fix it?