Nomad understand scheduler node selection

Question

I have a 3 node test cluster and several jobs (simple config, no constraints, java services). My problem is every time I start a job it will be started on the first node. If I increase the count=2 and add a distinct host constraint there are also allocations on the other nodes. But if I start 50 jobs with count=1, there are 50 allocations on the first node and non on node2 or node3.

job "test" {
  datacenters = ["dc1"]
  type = "service"
  group "test" {
    count = 1
    task "test" {
    driver = "java"
    config {
       jar_path    = "/usr/share/java/test.jar"
       jvm_options = [
                    "-Xmx256m",
                    "-Xms256m"]
    }
    resources {
       cpu = 300
       memory = 256
    }
 }
}

Now I want to understand/see how Nomad selects the node for the allocations. All 3 nodes have the same resources - so the jobs should be distributed equally?

EDIT: Suddenly the jobs will be distributed. So my new question is: Is there a verbose output or something where I can see how and why Nomad choose a specific node while starting a new job.

score 1 · Answer 1 · answered Jun 24 '18 at 11:11

As given in the official documentation

The second phase is ranking, where the scheduler scores feasible nodes to find the best fit. Scoring is primarily based on bin packing, which is used to optimize the resource utilization and density of applications, but is also augmented by affinity and anti-affinity rules. Nomad automatically applies a job anti-affinity rule which discourages colocating multiple instances of a task group. The combination of this anti-affinity and bin packing optimizes for density while reducing the probability of correlated failures.

This means that Nomad will try to "fill" a particular node at first. Let me take an example: Suppose you have three jobs with requirements:

j1(200M), j2(300M), j3(500M)

and have three nodes with free resources

n1(1G), n2(2G), n3(3G).

In this case, Nomad will choose a node that will fill first. So, when you try to schedule J1, N1 will be selected. Now, the state of the nodes with their remaining resources will be:

n1(800M), n2(2G), n3(3G)

Now, suppose you wanna schedule j2. In this case, n1 will be selected as that node will be filling faster than n2 and n3.

Hence, your final allocation with free resources will look like

n1 (j1,j2,j3)(100M)      n2(2G)       n3(3G)

Now if a job j4 comes in with 200M of requirement, n2 will be selected. This will send your cluster state into

n1 (j1,j2,j3)(100M)      n2(j4)(1800M)       n3(3G)

If you would like to understand more how bin-packing works in Nomad, you can check the advanced documentation at https://www.nomadproject.io/docs/internals/scheduling.html

Also, calculation of weights related to assignment of allocations is exposed on the result of the evaluation API. This would mean if you issue the following command:

$ nomad plan <job> file

And then note down the eval ID, and make an HTTP request on to the evaluation api

$ curl localhost:4646/v1/evaluation/<eval_id>

You would get the result of scheduler calculations and conditions for the to be scheduled nomad job. This plan command is very useful for understanding TaskGroup allocations. It will also tell you if you have enough resources to run the job in your datacenter.

Nomad understand scheduler node selection

1 Answers1