How to spread allocations evenly with spread stanza in nomad cluster (raw_exec driver)

Question

In production environment (using 10 nodes cluster) nomad "bin packing algorithm" causes big problems regarding system utilization.

The possible solution is to use the following rules/policies:

1. Distinct hosts

Use case: Mainly for multiple tcp/udp listeners behind a LoadBalancer. It works ok and exactly what would you expect.

2. Resource limiting

Via nomad-client conf as follows:

    client {
      enabled       = true
      cpu_total_compute = 12000
      reserved {
                cpu = 3000
                memory = 33000
                disk = 1

        }
      network_speed = 10000
      servers = ["127.0.0.1:4647"]
      options {
        "driver.raw_exec.enable" = "1"
      }

     }

Painful and limiting

3. Spread stanza

Sounds promising but could not get it to work using NOMAD-SDK.

Nomad java SDK 0.9.0-SNAPSHOT doesn't support Spread stanza via API. Instead it is possible to use method "addUnmappedProperty" in order to add custom JSON structures/arrays.

 Job jobSpec = nomadContext.getJob();
    List<Object> spreads = new ArrayList<>();
    Map<String, Object> spreadStanza = new HashMap<>();
    spreadStanza.put("Attribute", "${node.unique.id}");
    spreadStanza.put("Weight", 100);
   // spreadStanza.put("SpreadTarget", null);
    spreads.add(spreadStanza);
    jobSpec.addUnmappedProperty("Spreads", spreads);


    for(TaskGroup taskGroup: jobSpec.getTaskGroups()){
        taskGroup.addUnmappedProperty("Spreads", spreads);
    }

But unfortunately could not get it to work, allocation-spread is not shown in the verbose job status:

Another example uses simple hcl job specification deployed via command line params:

 job "sleep" {
  datacenters = ["dc1"]
  spread {
    attribute = "${node.unique.id}"
     weight    = 100
  }

  group "example" {
   count=10
   spread {
    attribute = "${node.unique.id}"
     weight    = 100
  }


    task "server" {
      driver = "raw_exec"

      config {
        command = "/bin/sleep"
        args = [
          "500"
        ]
      }

      resources {
        network {
          mbits = 10
        }
      }
    }
  }
}

In this case allocation-spread is shown

nomad alloc status -verbose 1feb7476

Node                                  job-anti-affinity  node-reschedule-penalty  node-affinity  allocation-spread  binpack  final score
4c4e3bb2-9568-3f5d-3a8c-fd056f258ed0  -0.4               0                        0              0.667              0.896    0.387
4b36b048-a24b-e0e9-a789-625764fcfa70  -0.5               0                        0              -0.667             0.901    -0.0886

I appreciate any help.

Thank you.

score 0 · Answer 1 · answered Feb 22 '20 at 06:38

0

Try to only apply the spread once, not in both job level and group level.

answered Feb 22 '20 at 06:38

chucky_z

1
3

score 0 · Answer 2 · answered Aug 03 '23 at 07:55

A bit late to this topic, but still...

There is a spread stanza you can use in your job description. This will propagate for all task groups listed in the same job description. (OR, spread stanza can be used in job group definition block affecting only that group)

spread {
    attribute = "${node.unique.id}"
    weight = 100
}

However, if like me, by allocation you instead want to spread jobs evenly, this approach will not help you. You will still end up with bin-packed jobs.

I did eventually found solution to job spreading also:

There is a way to configure nomad server to use spread as a default scheduling instead of binpack. This is achieved through configuration during bootstrap phase (unfortunately not working for me), and also by using nomad API (this is working for me).

You can change default scheduling by calling the following API:

POST .. /v1/operator/scheduler/configuration

with following body:

{
  "SchedulerAlgorithm": "spread",
  "MemoryOversubscriptionEnabled": false,
  "RejectJobRegistration": false,
  "PauseEvalBroker": false,
  "PreemptionConfig": {
    "SystemSchedulerEnabled": true,
    "SysBatchSchedulerEnabled": false,
    "BatchSchedulerEnabled": false,
    "ServiceSchedulerEnabled": true
  }
}

Check online nomad documentation for description on each of the keys.

You can also use the same endpoint with (GET) to check current scheduler config.

I'm using Nomad v1.6.1 ATM.

Hope this helps...

How to spread allocations evenly with spread stanza in nomad cluster (raw_exec driver)

2 Answers2