I’ve been thinking about scheduling and load balancing algorithms, and I came up with a problem that I think is interesting.
There are N cages and M zookeepers. Each cage has a size S and a number of animals A. The frequency with which a cage must be cleaned is computed as some function of A / S (smaller cages with more animals get dirty faster). The difficulty of cleaning a cage is computed as some other function of A and S, the details of which are unimportant (the size of a cage contributes most of the difficulty, and the number of animals contributes a little). Once every three days, any cages that are due for cleaning are cleaned (“cleaning day”). Zookeepers are completely identical and interchangeable. Zookeepers need to do a similar amount of work each cleaning day, and to not have to do much more work than any other zookeeper. The duration of time that a cage takes to clean is not part of the problem (it's assumed that time is roughly reflected by difficulty, and that there is always enough time in the day for a zookeeper to complete their assigned tasks).
The scheduling algorithm must tell each zookeeper which cages to clean on each cleaning day, such that
- each cage is cleaned at its ideal frequency or as close as possible,
- assigning a minimal and roughly equal number of cleanings per zookeeper per cleaning day,
- and assuring as equal a workload as possible across all zookeepers (i.e., over a period of time, the aggregate difficulties of each zookeeper’s workload are as close to equal as possible, and cages are distributed among zookeepers with roughly 1/M probability).
I’m wondering what an approximation algorithm for such an optimization problem would look like. It bears a resemblance to several different classic examples of NP-hard scheduling/resource utilization problems; maybe it is reducible to one such problem and I’m just missing it. If we get rid of the frequency/periodicity of tasks element, it is similar to the classic multiprocessor scheduling or finite bin packing problem.