Why does increasing the number of cores slow down my Spark job on EMR?

Question

I have a spark program that I'm running on EMR. I noticed that when I change my cluster to have more cores (by choosing an instance type that has more cores), 1) it takes considerably longer to complete the job, and 2) it never actually completes because it errors out.

Specifically, my job takes 4 minutes to complete when I use 19 c3.4xlarge slave nodes (so 304 cores), with 57 executors and 1140 partitions. But when I change to 20 c4.8xlarge slave nodes (so 720 cores) with 140 executors and 2800 partitions, it fails after 22 minutes.

Why is this the behavior? I would expect that by increasing the number of cores (and in effect, the number of partitions), the job would speed up. Furthermore, I'm unsure why the second scenario is failing.

In both cases, I have approximately 5 cores per executor, and 4 times the amount of partition as there are cores (assuming one core per node is used for system tasks and YARN agents).

Here is my spark-submit as requested in the comment below:

spark-submit --deploy-mode client --master yarn --num-executors 57 --class someMainClass /path/to/local/JAR

What do you mean by partitions, on the object? Also what's your spark submit? — afeldman, Aug 05 '18 at 23:49
Added the spark-submit to the original post. By partitions, I'm referring to the RDD I'm operating on. The 4x partitions as there are cores number was something I got from https://stackoverflow.com/a/35804407/9911641 — Jeremy Lin, Aug 06 '18 at 04:56
Below are a few questions that can help me reach an answer: Where does the process fail during write or during ETL processing? What is the error code? What is the relative size of the inputs? In Spark the default sql partition split is 200. Above we see 57 * 200 = 1140 and 140 * 200 = 2800. Adding this conf to your spark submit should reduce total partitioning --conf "spark.sql.shuffle.partitions=50" — afeldman, Aug 06 '18 at 22:04
The error log is huge, do you have suggestions as to how to where to look? Here's the one I see the most — Jeremy Lin, Aug 07 '18 at 18:17
ERROR TransportRequestHandler: Error sending result StreamResponse{streamId=/jars/my.jar, byteCount=174242145, body=FileSegmentManagedBuffer{file=/path/to/jar, offset=0, length=174242145}} to /someIPAddress; closing connection java.io.IOException: Broken pipe — Jeremy Lin, Aug 07 '18 at 18:19
I'm not sure as to the input size, but when I cache my RDDs it's about 500GB. The first RDD contains about 7000 objects. From this RDD, I flatmap and filter into another RDD of 400000 objects — Jeremy Lin, Aug 07 '18 at 18:21
Also two questions in response to what you said about sql partitions: 1) is this the same as doing a repartition(50)? If so, why would we want this (isn't 50 really low considering we have >50 cores?). 2) I forgot to note that I'm actually specifying the number of partitions myself in my spark-submit as a String[] arg. Can I accomplish the same by using --conf "spark.sql.shuffle.partitions=50"? — Jeremy Lin, Aug 07 '18 at 18:24
@JeremyLin did u resolve 'Broken pipe' issue? i am facing same issue — tooptoop4, Sep 25 '18 at 02:57

Why does increasing the number of cores slow down my Spark job on EMR?

0 Answers0