0

I have a question/problem regarding dynamic resource allocation. I am using spark 1.6.2 with stand alone cluster manager.

I have one worker with 2 cores. I set the the folllowing arguments in the spark-defaults.conf file on all my nodes:

spark.dynamicAllocation.enabled  true
spark.shuffle.service.enabled true
spark.deploy.defaultCores 1

I run a sample application with many tasks. I open port 4040 on the driver and i can verify that the above configuration exists.

My problem is that no matter what i do my application only gets 1 core even though the other core is available.

Is this normal or do i have a problem in my configuration?

The behaviour i want to get is this: I have many users working with the same spark cluster. I want that each application will get a fixed number of cores unless the rest of the clutser is pending. In this case I want that the running applications will get the total amount of cores until a new application arrives...

Do I have to go to mesos for this?

Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321
Ofer Eliassaf
  • 2,870
  • 1
  • 17
  • 22
  • 1
    You specify `spark.deploy.defaultCores 1`, so you get one core. – Yuval Itzchakov Oct 27 '16 at 08:39
  • i suspected that this is the problem - but how can i make sure that each application will get minimum amount of cores? – Ofer Eliassaf Oct 27 '16 at 08:44
  • 1
    If you specify 1, you'll get 1. If you want a larger minimum, change `defaultCores`. Does that answer your question? Not sure I understand what you mean exactly. – Yuval Itzchakov Oct 27 '16 at 08:46
  • according to what you mentioned defaultCores means maximum not minumum. I need minimum amount of cores to each application. – Ofer Eliassaf Oct 27 '16 at 08:48
  • Thats a real bummer. so dynammic scheduling is prettry useless mode if you want to share cluster among developers and make sure each gets a minumum share and that all your resources are utilized. Very weired implementation/design choices that the spark team did. – Ofer Eliassaf Oct 27 '16 at 08:53
  • 1
    Wait wait, I forgot this was dynamic allocation. You can definitely set the minimum *executors* via `spark.dynamicAllocation.minExecutors` and set the default cores to 1. So you can get two executors, each with one core. – Yuval Itzchakov Oct 27 '16 at 08:54
  • how is that helpful? - i need min cores ... i don't care about executors. I need to make sure that when multiple developers share the cluster they don't interfere with each other - each gets a share of the cluster. But in all times make sure that all the cores are in use to avoid money spending. – Ofer Eliassaf Oct 27 '16 at 08:57
  • 1
    I think there might be a misunderstanding on the usage of cores. Executors cannot dynamically use more cores if available. Executors always get started with the amount of cores specified. Dynamic allocation means that Spark will spawn more executors (with the specified amount of cores) if needed. – LiMuBei Oct 27 '16 at 09:00
  • 1
    You can't require minimum number of cores to use, specifically not in Standalone, I don't know about other resource managers. But you *can* work around that limitation by setting a minimum number of executors and default cores, that way you can play with the amount of resources each job has. Dynamic allocation is all about the use of executors, not core specifically. – Yuval Itzchakov Oct 27 '16 at 09:01
  • thanks a lot for your help! – Ofer Eliassaf Oct 27 '16 at 09:04
  • I don't get it: say for example i have 3 developers and 6 cores. If all 3 are running simoultanously i want each to get 2 cores. If only one is running i want him to get all 6 cores. How can this be achieved? – Ofer Eliassaf Oct 27 '16 at 09:13
  • That will only work if you never start executors with more than 1 core. This gives you the finest granularity for core usage as then executor is equivalent with core. Spark jobs usually profit from multicore executors though as they share resources which makes caching for example more effective. – LiMuBei Oct 27 '16 at 09:22
  • You can create a system which checks the available resources you have in the cluster and schedules according to that. You won't need dynamic allocation for that though. – Yuval Itzchakov Oct 27 '16 at 09:22
  • sorry - but writing a custom system to use available resources is like writing spark myself. I tried setting all the configuration you mentioned and it didn't work. I couldn't convience my cluster to get more executors in times when there were available resources. I come to a conclusion that my use case cannot be achieved using dynamic resource allocation. I guess i will have to work with mesos or live with a broken cluster. I just don't get what this dynammic schedling is for if it can't make such a simple use case worlk. – Ofer Eliassaf Oct 27 '16 at 10:40

0 Answers0