Spark Submit Executors == Spark Shell Tasks?

Question

I am trying to understand the difference in speed between my spark-submit and spark shell jobs. I start the shell or submit with the same resource allocations but I seem to be getting very different performance. When I run it in the shell it take ~10min vs. hr+ with spark submit. Then my question is, are the number of tasks shown in the progress bar of the REPL the same as the number of executors running in spark submit? I see very different numbers for each and I am wonder if I am doing something wrong.

In the shell I start it with

    --executor-cores 5 \
    --executor-memory 16g \
    --driver-memory 230g \
    --conf "spark.driver.maxResultSize=100g" \
    --conf "spark.network.timeout=360s

And I see 950 concurrent tasks

... pandas_df = intent_dict_rdd.map(lambda x: Row(**x)).toDF().toPandas()
[Stage 1:==============================>                  (19503 + 950) / 31641]

I do spark submit with the same resource allocation I only see 189 executors

18/07/19 23:44:25 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20180719234425-0001/189 on worker-20180719233757-10.0.108.198-33953 (10.0.108.198:33953) with 5 cores
18/07/19 23:44:25 INFO StandaloneSchedulerBackend: Granted executor ID app-20180719234425-0001/189 on hostPort 10.0.108.198:33953 with 5 cores, 16.0 GB RAM

I am using 10x m5.24xlarge machines so that is 96 cores and 384GB ram each. That is a total of 960 cores which looks a lot more like the number of tasks I see. The number of executors look a lot more like 960/5 cores each. Am I focusing on the wrong thing? Is there any other explanation for bad performance of spark submit vs. spark shell?

Which cluster manager are you using? If Yarn, what is `--num-executors` set to? — Mark Rajcok, Jul 21 '18 at 23:17
Standalone mode. I did not set --num-executors, it seems to figure that out based on available resources. What do you think? — Max, Jul 22 '18 at 22:19
What version of Spark are you using? Try `--total-executor-cores 950` instead of `--executor-cores 5` , https://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit. See if https://stackoverflow.com/questions/39399205/spark-standalone-number-executors-cores-control helps. See also https://spark.apache.org/docs/latest/spark-standalone.html#executors-scheduling — Mark Rajcok, Jul 23 '18 at 14:53

Spark Submit Executors == Spark Shell Tasks?

0 Answers0