I am testing Spark-1.5.1 with different G1 configurations and observe that my application takes 2 min to complete with MaxGCPauseMillis = 200 (default) and 4 min with MaxGCPauseMillis = 1. The heap usage depicted below. We can see from the statistics below that the GC time of both configs is different by only 5 sec.
I am wondering why execution time increases this much?
Some statistics:
MaxGCPauseMillis = 200 - No. young GCs: 67; GC time of an executor: 9.8 sec
MaxGCPauseMillis = 1 - No. young GCs: 224; GC time of an executor: 14.7 sec
Red area is area is young generation, black is old generation. The application runs on 10 nodes with 1 executor and 6 GB heap each.
The application is a Word Count example:
val lines = sc.textFile(args(0), 1)
val words = lines.flatMap(l => SPACE.split(l))
val ones = words.map(w => (w,1))
val counts = ones.reduceByKey(_ + _)
//val output = counts.collect()
//output.foreach(t => println(t._1 + ": " + t._2))
counts.saveAsTextFile(args(1))