6

I'm writing a python application that uses OpenStack to provide students access to a limited number of virtual machines.

Students can place reservations, either now or in the future.

I need to limit the number of virtual machines scheduled at any time to X while still allowing students to reserve vms if slots/reservations are available.

Reservation objects look like the below (sqlalchemy). I would know the start time and the length of the reservation requested, at which point I need to go through existing reservations and see if there are too many reservations in the time period requested. The *_job fields are the names of APScheduler jobs.

class Reservation(Entity):
    student = ManyToOne('Student', required=True)
    class_id = ManyToOne('Class', required=True)
    image = ManyToOne('Image', required=True)
    # openstack image id filled in once the instance is started
    instance_id = Field(UnicodeText)

    # apscheduler jobs
    stop_instance_job = Field(UnicodeText)
    start_instance_job = Field(UnicodeText)
    warn_reservation_ending_job = Field(UnicodeText)
    check_instance_job = Field(UnicodeText)

Any pointers on where to look for examples of schedule algorithms or something like that? I'm not even clear what to search for...

Thanks.

curtis
  • 95
  • 1
  • 8
  • 2
    This strikes me as an application for Dijkstra's Banker's Algorithm which is normally not discussed much in job scheduling as its preconditions (notably execution time) are hard to know in advance but which you have. The general class of problem is "Batch Scheduling" – msw Jul 29 '12 at 10:59
  • Great. Thanks kindly for that. :) – curtis Jul 29 '12 at 13:38
  • 1
    +1 for well-phrased, short, but complete question. – Jeff Tratner Jul 29 '12 at 19:38
  • Far from optimal solution in general case, but to get you started without any optimizations involved (especially when someone deletes the Reservations): **What to search for?** - Well, you need to search for all `Instance`s that do not have any reservations between the `StartTime` and `EndTime`. – van Jul 30 '12 at 12:54
  • Sorry, what I meant in terms of what to search for was what to google for scheduling. :) – curtis Aug 01 '12 at 15:07

1 Answers1

2

You should look up Grid based Schedulers. Normally schedulers don't know the true execution time (or time of resource use) and complicated heuristics are used to guess how long a problem will take (see such heuristics on a grid scheduler at: PDF download Describing Scheduling on Grid basis). A simpler approach with a basic grid for representing workload over time will most likely meet your needs. Python doesn't have any awesome grid object libraries that I know of (I've implemented a few in C++ and Python before though and they're not too hard). You should look at the numpy package for the easier interpretation of multi-dimensional objects -- which can emulate or implement grids easily enough.

Msw mentioned Dijkstra's Banker's Algorithm which is a form of job scheduling -- however your problem cares about future state more than current state and you can accurately predict (know the true value of) task times. Thus a T(timesteps) by N (number of resources -- might be just 1) by M (max resource reservations) grid which you fill in as jobs are registered would suffice. Determining if a particular job can be scheduled in a particular timeslot is a O(task_length * M) checks on a subsection of the grid (start, stop)x(required_resources)x(1,M) for an empty slot.

Finding an adequate location for a particular job (picking the start time) is a more difficult task and would be achieved by a modified Dijkstra's algorithm, or from any standard scheduler (msw's comment is more helpful for this task than for a timeslot capability check). Note that a lot of the scheduler content online is specific to OS process scheduling which cares more about the type of operation (I/O or not) and penalties for taking longer than expected than about abstract resource use. So google searches for schedulers will oftentimes give you Linux scheduler implementations rather than techniques for arbitrary data. Try looking up Shortest job schedulers, which are oftentimes simpler and less reliant on OS tasks when explained.

Pyrce
  • 8,296
  • 3
  • 31
  • 46