1

I ran a SVM algorithm using MLIB library in Spark on a data of size 8G, and 7 million rows. I am running Spark in standalone mode on a single node.

I used /usr/bin/time -v to capture data about the job. I got the peak memory utilization, and % CPU time among other things. The % CPU utilization I got was a mere 6%. I was monitoring TOP while the program was running as well for sometime and I could see more than 100% being used almost consistently. I am now confused why /usr/bin/time showed only 6%?

Some more details - my machine is 16G, and the program I was running was consuming 13.88G. The program executed in 2.1 hour.

Any insights, anyone?

Testing123
  • 363
  • 2
  • 12

1 Answers1

0

I figured out the problem. So, what usr/bin/time showed (6%) was a percentage of the total CPU available (8 threads in this case) while TOP was showing 100% for 1 single thread.

Btw, if it helps anyone, the reason why only 1 thread was being used instead of all 8 was that I had mentioned "local" and not "local[*] in my SparkContext (sc = SparkContext ("local", ...). Read more about it HERE.

Testing123
  • 363
  • 2
  • 12