apache spark standalone scheduler - why does driver need a whole core in 'cluster' mode?

Question

In spark's 'client' deploy mode the spark driver does not consume cores, only spark apps do. But why in 'cluster' mode does the spark driver need a core for itself?

score 0 · Answer 1 · answered Jun 27 '19 at 00:04

0

In client mode the machine that submits the job is the driver.

answered Jun 27 '19 at 00:04

AssHat_

353
2
13

This is why terminating the process (with SIGINT or kill) results in the loss of the spark applications 'driver', and the application will exit – AssHat_ Jun 27 '19 at 00:12
still confused why cluster mode needs an entire CPU core, couldn't the machine be driver in cluster mode? – tooptoop4 Jun 27 '19 at 22:03
In 'Cluster mode' the driver is run within the cluster. The machine that submits job is no longer involved once the submission is complete. In 'Client mode' the driver is run locally, with executors in the cluster. This is the difference between Cluster/Client modes – AssHat_ Jun 28 '19 at 01:15

score 0 · Answer 2 · answered Jun 28 '19 at 00:07

0

A core in the Spark context is not the same as a CPU core. It's just a unit of computation with a set amount of RAM. A core is needed to run any process. The driver needs to coordinate Spark tasks on the cluster. In reality, it most likely is consuming a tiny fraction of a CPU and probably around 1-2 GB of memory.

answered Jun 28 '19 at 00:07

tk421

5,775
6
23
34

how can i get more spark cores than CPU cores? otherwise it is effectively the same in that i need at least 1 whole CPU core for each spark driver – tooptoop4 Jun 28 '19 at 18:12

apache spark standalone scheduler - why does driver need a whole core in 'cluster' mode?

2 Answers2