0

I am trying to run a spark application on spark standalone cluster having total 3 nodes.

there are 3 workers on the cluster and having ram in one node there is 4 GB and rest having 8 GB Ram.

i am executing the same application having different cores like 2,3,4,5 but still the execution time is same to execute the application

i am passing the application to the cluster using sparkclr-submit

can anyone tell me why this can be happening?

here is the image of sparkUI

Thanks.

enter image description here

Jay Prajapati
  • 362
  • 4
  • 26
  • very good question – aayushi Jul 11 '17 at 12:07
  • @aayushi thanks for comment but actually want answer of it.! – Jay Prajapati Jul 11 '17 at 12:08
  • apparently you don't have enough parallelism to utilize more than 1 core, I suggest investigate the issue using SparkUI – Raphael Roth Jul 11 '17 at 12:12
  • @RaphaelRoth i have also investigated using SparkUI it show me that my application which is submitted using 2,3,4 or 5 which ever cores i assigned to it – Jay Prajapati Jul 11 '17 at 13:13
  • No I meant that you should check whether all your jabs&stages (are at least the long-running ones) have enough tasks. Maybe there is a bottleneck in your code – Raphael Roth Jul 11 '17 at 13:16
  • @RaphaelRoth I am not getting you what you say can you help me sharing some documents regarding it or any link. Thanks – Jay Prajapati Jul 11 '17 at 13:20
  • SparkUI is normally on port 4040, you should see somethink like this: http://www.hammerlab.org/images/spree/stages.png . What I would do is to check whether all long-running stages have at least 5 tasks, and that those tasks are not (too) skewed, sometimes 199 tasks are empty and only 1 tasks contain all the data etc. You can see that tasks statistics if you click on a stage in spark ui – Raphael Roth Jul 11 '17 at 13:34
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/148908/discussion-between-jay-prajapati-and-raphael-roth). – Jay Prajapati Jul 11 '17 at 14:03

0 Answers0