2

I have a Apache Spark 1.6.1 standalone cluster set on a single machine with the following specifications:

  • CPU: Core i7-4790 (# of cores: 4, # of threads: 8)
  • RAM: 16GB

I am using the following settings in conf/spark-env.sh

export SPARK_WORKER_MEMORY 
export SPARK_WORKER_INSTANCES 
export SPARK_WORKER_CORES
export SPARK_WORKER_DIR

Since the machine has a processor with 4 cores, I thought the possible configurations could be:

export SPARK_WORKER_MEMORY = 14g
export SPARK_WORKER_INSTANCES = 1
export SPARK_WORKER_CORES = 1
export SPARK_WORKER_DIR =/local_drive/sparkdata

export SPARK_WORKER_MEMORY = 14g
export SPARK_WORKER_INSTANCES = 1
export SPARK_WORKER_CORES = 2
export SPARK_WORKER_DIR =/local_drive/sparkdata

export SPARK_WORKER_MEMORY = 14g
export SPARK_WORKER_INSTANCES = 1
export SPARK_WORKER_CORES = 3
export SPARK_WORKER_DIR =/local_drive/sparkdata

export SPARK_WORKER_MEMORY = 14g
export SPARK_WORKER_INSTANCES = 1
export SPARK_WORKER_CORES = 4
export SPARK_WORKER_DIR =/local_drive/sparkdata

export SPARK_WORKER_MEMORY = 7g
export SPARK_WORKER_INSTANCES = 2
export SPARK_WORKER_CORES = 1
export SPARK_WORKER_DIR =/local_drive/sparkdata

export SPARK_WORKER_MEMORY = 7g
export SPARK_WORKER_INSTANCES = 2
export SPARK_WORKER_CORES = 2
export SPARK_WORKER_DIR =/local_drive/sparkdata

export SPARK_WORKER_MEMORY = 4.5g
export SPARK_WORKER_INSTANCES = 3
export SPARK_WORKER_CORES = 1
export SPARK_WORKER_DIR =/local_drive/sparkdata

export SPARK_WORKER_MEMORY = 3.5g
export SPARK_WORKER_INSTANCES = 4
export SPARK_WORKER_CORES = 1
export SPARK_WORKER_DIR =/local_drive/sparkdata

So what I thought is:

  • The memory has to be divided between the number of worker instances,
  • The number of worker cores is multiplied per the worker instances, therefore, I can't have 4 worker instances and 2 worker cores because I don't have 8 cores.

The issue is that I tested the configuration, and set:

export SPARK_WORKER_MEMORY = 14g
export SPARK_WORKER_INSTANCES = 3
export SPARK_WORKER_CORES = 40
export SPARK_WORKER_DIR =/local_drive/sparkdata

And I got any error, plus in the Apache web UI it appears "40 cores". What is happening? How many worker cores and worker instances can I really have then?

Thanks in advance.

Tom Harrington
  • 69,312
  • 10
  • 146
  • 170
User2130
  • 555
  • 1
  • 6
  • 16
  • I tried to set 100 cores and was able to do that and Spark UI says available cores is 100. – Amit Kumar Jun 04 '16 at 15:05
  • Yes, true. That is what Spark UI says, but how come? If I only have 4 cores? – User2130 Jun 04 '16 at 15:09
  • 1
    Possible duplicate of [Spark - what happens if i try to use more cores than I have?](http://stackoverflow.com/questions/34912457/spark-what-happens-if-i-try-to-use-more-cores-than-i-have) – zero323 Jun 04 '16 at 15:21
  • That helps a lot! Only question that remains is then, the maximum number of cores is, infinite? (discarding recommended number but only talking about possibility) – User2130 Jun 04 '16 at 15:33
  • Nope. Threads are relatively expensive process. There are both hard limitations (like memory size, stack size) as well as system limitations `/proc/sys/kernel/threads-max`. Using more than few times than number of available cores doesn't really make sense anyway. – zero323 Jun 04 '16 at 15:55

0 Answers0