Spark streaming with Yarn: executors not fully utilized

Question

I am running spark streaming with Yarn with -

spark-submit --master yarn --deploy-mode cluster --num-executors 2 --executor-memory 8g --driver-memory 2g --executor-cores 8 ..

I am consuming Kafka through DireactStream approach (No receiver). I have 2 topics (each with 3 partitions).

I reparation RDD (i have one DStream) into 16 parts (assuming no of executor * num of cores = 2 * 8 = 16 Is it correct ?) and then i do foreachPartition and writes each partition to local file and then send it to other server (not spark) through http (Using apache http sync client with pooling manager via post with multi-part).

When i checked details of this step (or JOB is it correct naming?) through Spark UI, it showed that total 16 task executed on single executor with 8 task at a time.

This is Spark UI details -

Details for Stage 717 (Attempt 0)

Index  ID  Attempt Status  Locality Level  Executor ID / Host  Launch Time Duration  GC Time Shuffle Read Size / Records Errors
0  5080  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:46 2 s 11 ms 313.3 KB / 6137 
1  5081  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:46 2 s 11 ms 328.5 KB / 6452 
2  5082  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:46 2 s 11 ms 324.3 KB / 6364 
3  5083  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:46 2 s 11 ms 321.5 KB / 6306 
4  5084  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:46 2 s 11 ms 324.8 KB / 6364 
5  5085  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:46 2 s 11 ms 320.8 KB / 6307 
6  5086  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:46 2 s 11 ms 323.4 KB / 6356 
7  5087  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:46 3 s 11 ms 316.8 KB / 6207 
8  5088  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:48 2 s   317.7 KB / 6245 
9  5089  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:48 2 s   320.4 KB / 6280 
10  5090  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:48 2 s   323.0 KB / 6334 
11  5091  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:48 2 s   323.7 KB / 6371 
12  5092  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:48 2 s   316.7 KB / 6218 
13  5093  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:48 2 s   321.0 KB / 6301 
14  5094  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:48 2 s   321.4 KB / 6304 
15  5095  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 12:11:49 2 s   319.1 KB / 6267

I was expecting it to execute 16 parallel task (2 executor * 8 core) on either one or more executor. I think i am missing something. Please help.

Update:

Incoming data is not evenly distributed. e.g. 1st topic has 2nd partition with 5*5 = 25k messages (5k = maxRatePerPartition, 5s = batch interval) and other two partition has almost 0 data few times. The 2nd Topic has ~500-4000 message per batch which is evenly distributed across 3 partition.
when there is no data in topic 1 then i see 16 parallel task processing across 2 executors.

Index ID  Attempt Status  Locality Level  Executor ID / Host  Launch Time Duration  GC Time Shuffle Read Size / Records Errors
0 330402  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/28 04:31:41 1 s   19.2 KB / 193 
1 330403  0 SUCCESS NODE_LOCAL  2 / executor2_machine_host_name  2016/12/28 04:31:41 1 s   21.2 KB / 227 
2 330404  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/28 04:31:41 1 s   20.8 KB / 214 
3 330405  0 SUCCESS NODE_LOCAL  2 / executor2_machine_host_name  2016/12/28 04:31:41 1 s   20.9 KB / 222 
4 330406  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/28 04:31:41 2 s   21.0 KB / 222 
5 330407  0 SUCCESS NODE_LOCAL  2 / executor2_machine_host_name  2016/12/28 04:31:41 1 s   20.5 KB / 213 
6 330408  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/28 04:31:41 1 s   20.4 KB / 207 
7 330409  0 SUCCESS NODE_LOCAL  2 / executor2_machine_host_name  2016/12/28 04:31:41 1 s   19.2 KB / 188 
8 330410  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/28 04:31:41 1 s   20.4 KB / 214 
9 330411  0 SUCCESS NODE_LOCAL  2 / executor2_machine_host_name  2016/12/28 04:31:41 1 s   20.1 KB / 206 
10  330412  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/28 04:31:41 0.6 s   18.7 KB / 183 
11  330413  0 SUCCESS NODE_LOCAL  2 / executor2_machine_host_name  2016/12/28 04:31:41 1 s   20.6 KB / 217 
12  330414  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/28 04:31:41 1 s   20.0 KB / 206 
13  330415  0 SUCCESS NODE_LOCAL  2 / executor2_machine_host_name  2016/12/28 04:31:41 1 s   20.7 KB / 216 
14  330416  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/28 04:31:41 1 s   18.8 KB / 186 
15  330417  0 SUCCESS NODE_LOCAL  2 / executor2_machine_host_name  2016/12/28 04:31:41 1 s   20.4 KB / 213

Pls check this [answer once](http://stackoverflow.com/questions/38465692/spark-coalesce-relationship-with-number-of-executors-and-cores/40410040#40410040). — mrsrinivas, Dec 27 '16 at 14:49
Try giving `--num-executors 6` (As you have 2 topics each with 3 partitions). **1 partitions = 1 executor** is ideal choice. (`--executor-cores` will depends on your core availability and required parallelization in each partition) — mrsrinivas, Dec 27 '16 at 14:51
I tried with 6 executor and 4 core but still all task of this stage is executing on same executor (now 4 at a time) — Nishant Kumar, Dec 27 '16 at 15:38
shouldn't it be executed in parallel across all available executors ? — Nishant Kumar, Dec 27 '16 at 15:48
It should run **6 executors with 4 tasks in each**. yeah must be _parallel across all available executors_ — mrsrinivas, Dec 27 '16 at 16:32
May be it's because of [data locality](https://spark.apache.org/docs/latest/tuning.html#data-locality). try set `spark.locality.wait=1s`. — mrsrinivas, Dec 27 '16 at 16:36
@mrsrinivas i tried `spark.locality.wait=1s`. It helped in some extend but not completely. I still see some tasks (~40% time) getting scheduled on same executor (8 task at once ) even after 2 sec elapsed. — Nishant Kumar, Dec 28 '16 at 05:58
It seems because of small jobs all are scheduling in same executor for locality. Try giving huge data which will take more time to compute, so that i might pick other executors as well. — mrsrinivas, Dec 28 '16 at 06:34
the previous comment was based on large data in 1st topic on one partition that was ~10k/sec and other 2 partition had 0 data . 2nd topic had equally distributed data of rate ~1k-3k/sec combined. — Nishant Kumar, Dec 28 '16 at 10:20

score 0 · Answer 1 · answered Dec 29 '16 at 06:09

0

Try increasing number of partitions equal to number of executor cores,since you are giving 8 executor cores,increase number of partitions on Kafka topic to 8. Also ,check what happens if you do not do re partition.

answered Dec 29 '16 at 06:09

Shahbaz Hussain

1

1

I doubt that this will solve my problem as reading is not bottleneck it's scheduling i guess – Nishant Kumar Dec 29 '16 at 06:45

Sachin · Answer 2 · 2017-08-10T08:59:17.313

0

Set below parameters with --num-executors 6

spark.default.parallelism

spark.streaming.concurrentJobs

Set above parameter values according to your requirement and environment. This will work for you.

edited Aug 10 '17 at 08:59

answered Aug 10 '17 at 08:48

Sachin

63
8

Spark streaming with Yarn: executors not fully utilized

2 Answers2