1

I'm trying to understand how Nomad spread works with client failues.

In Nomad jobs, you can define a spread stanza, such that a job's instances get spread actoss all client.

Here are the docs: https://www.nomadproject.io/docs/job-specification/spread

As the spread is a soft preference, if one of clients goes down, for any reason, Nomad will migrate all the jobs running on the lost client to another available clients. (This takes effect even with bin-packing).

In case of a 2 client grid and a job with two allocations, if one client fails, both allocations will run on the same client.

What happens when the grid recovers and brings up a new client? Will the jobs be re-spread, following the spread stanza, to both nodes, or will the two allocations continue to run on the same client until the job is re-run?

summerbulb
  • 5,709
  • 8
  • 37
  • 83

1 Answers1

2

Having a hard time tracking down an answer on this. I was under the impression that unless there are resource constraints Nomad will not migrate an allocation. So in your example the jobs will not re-spread.

A few places to search for more certainty:

  1. There's Node affinity: https://learn.hashicorp.com/tutorials/nomad/affinity, binpack, job-anti-affinity, and node-reschedule-penalty could factor into your allocation being moved (but again, I think this is just under resource contention)
  2. Preemption is now a thing, but again seems more about resource constraints: https://www.nomadproject.io/docs/internals/scheduling/preemption
  3. There is a spread test here that I think could be modified to test your use case: https://github.com/hashicorp/nomad/blob/235f938e87afbdc73037c1868a17668c06a8cf94/scheduler/generic_sched_test.go#L616, I tried to do so myself, but got a little bogged down in the complexity.

Anecdotally I have never seen an allocation willingly migrate on a Node with free resources, and I have not seen allocations move freely to new empty nodes.

maxm
  • 3,412
  • 1
  • 19
  • 27