3

I have a system with the following properties:

  1. There are workers that work on jobs. Workers can be added or removed. Each worker can run multiple jobs concurrently.

  2. There are jobs. These jobs run forever (infinite duration) and are assigned to workers. Jobs can be added or removed.

I am using round robin to assign jobs to workers on start up and this work pretty well.

However, I want to rebalance jobs assigned to workers when workers are added and removed and when jobs are added and removed.

While it is possible to reassign everything using the round robin algorithm when any of the above changes happen, it would be making more changes than required.

In other words, are there any round robin rebalancing algorithms out there that will result in minimal amount of diffs/changes to the assignments?

F21
  • 32,163
  • 26
  • 99
  • 170
  • 1
    How can a worker work on multiple jobs at the *same* time ? There is also an implicit assumption in your description that a worker can *interrupt* his work on a job and that another worker can *resume* this interrupted job. Is it a problem solved or do you expect the answer to your question to solve this issue ? – fjardon Sep 14 '16 at 11:36
  • A worker can work on multiple jobs at the same time because each job is launched as a go routine. In other words, each job can be thought of as a thread (although go routines are not threads). The job is relatively simple: pushing messages from a database into a message bus, so it is trivial to switch a job onto another worker. – F21 Sep 14 '16 at 11:48
  • Can a single work be assigned to multiple workers at the same time? – Saurav Sahu Sep 14 '16 at 12:27
  • @SauravSahu No. Each job can only be assigned to 1 worker. – F21 Sep 14 '16 at 12:29

2 Answers2

1

I assume, your round-robin approach assigns the jobs in the following manner:

W1   W2   W3   W4
-----------------
J1   J2   J3   J4
J5   J6   J7   J8
J9

Adding a new job is quite simple. You just have to remember the worker that you assigned the last job (the state of the round-robin algorithm, will be referred to as last worker in the following) and assign the new job to the next worker. Increment the last worker.

If you want to remove a job (e.g. J7 in the example above), do the following: First, remove the job:

W1   W2   W3   W4
-----------------
J1   J2   J3   J4
J5   J6        J8
J9

Then pick the last job of the last worker and re-assign it to the worker that lost a job (unless the erased job was the last job):

W1   W2   W3   W4
-----------------
J1   J2   J3   J4
J5   J6   J9   J8

Decrement the last worker

If you want to add a worker, do the following: Pick the last job of the last worker and assign it to the new worker until the number of jobs of the new worker is equal or one less than the number of jobs of the first worker.

W1   W2   W3   W4   W5
----------------------
J1   J2   J3   J4   J8
J5   J6   J9

Update the last worker accordingly.

Removing a worker is quite simple if you already have all of the above: Just take all of its jobs and add it one at a time. E.g. if you remove W2:

W1   W3   W4   W5
----------------------
J1   J3   J4   J8
J5   J9   J2   J6

Depending on the size of your data, you should use appropriate data structures to make this efficient. But I am sure you know what structures to use.

Nico Schertler
  • 32,049
  • 4
  • 39
  • 70
  • 1
    There are some cases where your solution is not quite optimal, in the sense that an equally good balance could be maintained with fewer job transfers. For example, after removing job `J7` from `W3` as in your example above, let's remove `J6` from `W2` next. Your algorithm would now transfer job `J8` from `W4` to `W2` to maintain balance, whereas it would be more efficient (in terms of number of transfers needed) to keep the job assignments as they are, and instead *reorder the workers* so that `W2` gets moved to the end of the worker list. – Ilmari Karonen Sep 14 '16 at 12:49
  • In first case, lets say we have upto `J7` jobs (no `J8` and `J9`), then `J6` is removed. Then don't you think there is unnecessary overhead of moving `J7` to `W2`? – Saurav Sahu Sep 14 '16 at 12:51
  • You are both right. I'm not claiming that this is a perfect solution. Incorporating a worker reordering should be quite easy (e.g. compare the number of jobs of the current worker to the number of jobs of the first worker). Bulk insertion and deletion is an entirely different story, though. – Nico Schertler Sep 14 '16 at 12:55
0

To reduce or optimise the number of movements of jobs:

Create a list (say, workers) of pairs (worker and numOfJobsAssigned) and separately maintain a variable maxJobsToAnySingleWorker at present.

Upon achieving equilibrium state (i.e. all workers have same number of jobs), increase maxJobsToAnySingleWorker by 1 and then add new job.

Start with maxJobsToAnySingleWorker = 0

For addition of a Job:
    Set Done to false
    for each worker in workers 
        if numOfJobsAssigned < maxJobsToAnySingleWorker
            Increase worker.numOfJobsAssigned by 1
            Set Done to true
            break
    if Done is false (equilibrium state)
        increase maxJobsToAnySingleWorker by 1
        Increase FirstWorker.numOfJobsAssigned by 1


For removal of a Job from a worker, say myWorker:
    Done = false
    Remove Job
    if myWorker.numOfJobsAssigned == maxJobsToAnySingleWorker-1
         Do nothing
    else
        for each worker in workers 
            if (numOfJobsAssigned > 1) and (numOfJobsAssigned == maxJobsToAnySingleWorker)
                Delegate Job from worker to myWorker
                Decrease worker.numOfJobsAssigned by 1
                Done = true
                break
        if worker is lastWorkerInList
            Decrease maxJobsToAnySingleWorker by 1

Following above logic, removal of a worker can be accomplished by doing removal of a job from leaving worker + addition of a job to staying workers one by one.

Saurav Sahu
  • 13,038
  • 6
  • 64
  • 79