15

I'm currently working on a website that will allow students from my university to automatically generate valid schedules based on the courses they'd like to take.

Before working on the site itself, I decided to tackle the issue of how to schedule the courses efficiently.

A few clarifications:

  1. Each course at our university (and I assume at every other university) comprises of one or more sections. So, for instance, Calculus I currently has 4 sections available. This means that, depending on the amount of sections, and whether or not the course has a lab, this drastically affects the scheduling process.

  2. Courses at our university are represented using a combination of subject abbreviation and course code. In the case of Calculus I: MATH 1110.

  3. The CRN is a code unique to a section.

  4. The university I study at is not mixed, meaning males and females study in (almost) separate campuses. What I mean by almost is that the campus is divided into two.

  5. The datetimes and timeranges dicts are meant to decreases calls to datetime.datetime.strptime(), which was a real bottleneck.

My first attempt consisted of the algorithm looping continuously until 30 schedules were found. Schedules were created by randomly choosing a section from one of the inputted courses, and then trying to place sections from the remaining courses to try to construct a valid schedule. If not all of the courses fit into the schedule i.e. there were conflicts, the schedule was scrapped and the loop continued.

Clearly, the above solution is flawed. The algorithm took too long to run, and relied too much on randomness.

The second algorithm does the exact opposite of the old one. First, it generates a collection of all possible schedule combinations using itertools.product(). It then iterates through the schedules, crossing off any that are invalid. To ensure assorted sections, the schedule combinations are shuffled (random.shuffle()) before being validated. Again, there is a bit of randomness involved.

After a bit of optimization, I was able to get the scheduler to run in under 1 second for an average schedule consisting of 5 courses. That's great, but the problem begins once you start adding more courses.

To give you an idea, when I provide a certain set of inputs, the amount of combinations possible is so large that itertools.product() does not terminate in a reasonable amount of time, and eats up 1GB of RAM in the process.

Obviously, if I'm going to make this a service, I'm going to need a faster and more efficient algorithm. Two that have popped up online and in IRC: dynamic programming and genetic algorithms.

Dynamic programming cannot be applied to this problem because, if I understand the concept correctly, it involves breaking up the problem into smaller pieces, solving these pieces individually, and then bringing the solutions of these pieces together to form a complete solution. As far as I can see, this does not apply here.

As for genetic algorithms, I do not understand them much, and cannot even begin to fathom how to apply one in such a situation. I also understand that a GA would be more efficient for an extremely large problem space, and this is not that large.

What alternatives do I have? Is there a relatively understandable approach I can take to solve this problem? Or should I just stick to what I have and hope that not many people decide to take 8 courses next semester?

I'm not a great writer, so I'm sorry for any ambiguities in the question. Please feel free to ask for clarification and I'll try my best to help.

Here is the code in its entirety.

http://bpaste.net/show/ZY36uvAgcb1ujjUGKA1d/

Note: Sorry for using a misleading tag (scheduling).

David Robinson
  • 77,383
  • 16
  • 167
  • 187
Assil Ksiksi
  • 344
  • 1
  • 3
  • 13
  • 3
    What are you actually asking? This seems very vague and open-ended. FYI, scheduling problems are generally NP. A lot of research has been done on this exact problem, have you tried google and looking up publications? For example: http://www.aloul.net/Papers/faloul_sch_gcc07.pdf – Austin Henley Nov 06 '12 at 19:23
  • 1
    This looks exactly like the knapsack problem. You have a pool of classes and a predetermined time allotment you must fill a knapsack such that none of the current classes in the knack sack overlap and fill up as much time as possible. Which if the problem is posed with that in mind then a GA could be applied to efficient (sometimes) find a optimal, within error, course schedules. – sean Nov 06 '12 at 19:25
  • 1
    It is a constraint satisfaction problem. – Austin Henley Nov 06 '12 at 19:26
  • I have tried Google, but couldn't find much help. As for publications, all I can say is that I'm still an amateur. I'm asking if there are certain methods or algorithms than can be applied to solve my problem more efficiently. – Assil Ksiksi Nov 06 '12 at 19:26
  • @Cyph0n It is a difficult problem that a lot of people have tried to solve. Looking up publications and searching on Google is your best bet. – Austin Henley Nov 06 '12 at 19:27
  • You need to provide more information regarding your data set. For instance, what is the smallest discrete unit of time that can be considered for writing your scheduler, 15 minutes? If a class runs from 10:05 to 12:10, can you not sufficiently abstract that away as 'busy from 10 to 12:15'? Is there a possibility of a class at 12:12? Most schools have 'blocks' of time available for classes. By operationalizing the blocks, you shrink your search space, and you don't have to do funky manipulations of datetime strings. Re-conceptualize what time is, and go from there. – kreativitea Nov 06 '12 at 19:28
  • Dynamic programming might work in your situation- specifically recursive backtracking. That is, make one course assignment. If it leads to no contradictions, make another. Whenever you reach a contradiction, backtrack. – David Robinson Nov 06 '12 at 19:28
  • @kreativitea The smallest unit of time is probably 50 minutes. Classes usually start at something like 10:00, or 10:30. – Assil Ksiksi Nov 06 '12 at 19:30
  • @DavidRobinson So I'd have something like an infinite loop. It assigns a course to a schedule. If there is a contradiction.. move backwards? Meaning I'd have to somehow keep track of assignments. Am I on the right track? – Assil Ksiksi Nov 06 '12 at 19:32
  • @Cyph0n - some friends at my university recently built a large scale course-scheduling platform: http://uwflow.com - you can try contacting the authors to see how they approached the problem – sampson-chen Nov 06 '12 at 19:33
  • @Cyph0n: It wouldn't be an infinite loop. Recursive backtracking is (as its name implies) usually done with recursive function calls. This lets all the assignments be kept track of in the specific function's environment when it was called. Imagine going through a maze using depth first search- you keep taking the left route, until you can't, then you backtrack until you get to the last place where you haven't tried all the options, and take one of the untaken options. – David Robinson Nov 06 '12 at 19:37
  • 3
    The problem of scheduling course times, rooms, etc. for the entire university is famously hard, yes. But the OP already has the university's schedule, no? The OP is just trying to find a compatible subset for a particular student's interests. No need to employ nuclear devices to solve this. – A. Webb Nov 06 '12 at 19:47
  • How many courses are available in total? What is the mean number of sections per course? – randomhuman Nov 06 '12 at 19:49
  • 1
    @Cyph0n Then your smallest unit of time is -30- minutes. Divide up the day into 30 minute bins, from the earliest class to the latest class. 5 days a week * 24 bins = 120 total bins. You can give every class a binary representation (120 bit long bitarray) of time, and keep that in memory. That way, when you look for as schedule, all you have to do is `a|b` and if the count of c is not equal to the count of a + the count of b, you have a collision. Using this method of collision detection, 6!(720) comparisons will be practically instantaneous. (no string manipulation, yay!) – kreativitea Nov 06 '12 at 19:50
  • @randomhuman In total, the university offers around 1000 courses. The average number of sections per course is 3. – Assil Ksiksi Nov 06 '12 at 19:56
  • @Cyph0n Ok, I think that is too many for what I was going to suggest... – randomhuman Nov 06 '12 at 20:05
  • @Cyph0n Can we get some sample input (e.g. what does an entry in course_info look like). Also, you're checking gender way too late in the chain, considering it basically divides your search field in two; going through the list of results and popping the results that don't match the gender is pretty inefficient-- You should compile a different dictionary based on the gender. – kreativitea Nov 06 '12 at 20:47
  • @kreativitea A course_info entry looks like this: http://sebsauvage.net/paste/?043c868816fd424d#LgJKAn4VxBVJRsr80icuMz5XYdTNWCO9RGyBA+C2QRE= – Assil Ksiksi Nov 07 '12 at 02:19

5 Answers5

18

Scheduling is a very famous constraint satisfaction problem that is generally NP-Complete. A lot of work has been done on the subject, even in the same context as you: Solving the University Class Scheduling Problem Using Advanced ILP Techniques. There are even textbooks on the subject.

People have taken many approaches, including:

You need to reduce your problem-space and complexity. Make as many assumptions as possible (max amount of classes, block based timing, ect). There is no silver bullet for this problem but it should be possible to find a near-optimal solution.

Some semi-recent publications:

Austin Henley
  • 4,625
  • 13
  • 45
  • 80
4

Did you ever read anything about genetic programming? The idea behind it is that you let the 'thing' you want solved evolve, just by itsself, until it has grown to the best solution(s) possible.

You generate a thousand schedules, of which usually zero are anywhere in the right direction of being valid. Next, you change 'some' courses, randomly. From these new schedules you select some of the best, based on ratings you give according to the 'goodness' of the schedule. Next, you let them reproduce, by combining some of the courses on both schedules. You end up with a thousand new schedules, but all of them a tiny fraction better than the ones you had. Let it repeat until you are satisfied, and select the schedule with the highest rating from the last thousand you generated.

There is randomness involved, I admit, but the schedules keep getting better, no matter how long you let the algorithm run. Just like real life and organisms there is survival of the fittest, and it is possible to view the different general 'threads' of the same kind of schedule, that is about as good as another one generated. Two very different schedules can finally 'battle' it out by cross breeding.

A project involving school schedules and genetic programming: http://www.codeproject.com/Articles/23111/Making-a-Class-Schedule-Using-a-Genetic-Algorithm

I think they explain pretty well what you need.

My final note: I think this is a very interesting project. It is quite difficult to make, but once done it is just great to see your solution evolve, just like real life. Good luck!

Hidde
  • 11,493
  • 8
  • 43
  • 68
3

The way you're currently generating combinations of sections is probably throwing up huge numbers of combinations that are excluded by conflicts between more than one course. I think you could reduce the number of combinations that you need to deal with by generating the product of the sections for only two courses first. Eliminate the conflicts from that set, then introduce the sections for a third course. Eliminate again, then introduce a fourth, and so on. This should see a more linear growth in the processing time required as the number of courses selected increases.

randomhuman
  • 547
  • 4
  • 11
2

This is a hard problem. It you google something like 'course scheduling problem paper' you will find a lot of references. Genetic algorithm - no, dynamic programming - yes. GAs are much harder to understand and implement than standard DP algos. Usually people who use GAs out of the box, don't understand standard techniques. Do some research and you will find different algorithms. You might be able to find some implementations. Coming up with your own algorithm is way, way harder than putting some effort into understanding DP.

RParadox
  • 6,393
  • 4
  • 23
  • 33
1

The problem you're describing is a Constraint Satisfaction Problem. My approach would be the following:

  • Check if there's any uncompatibilities between courses, if yes, record them as constraints or arcs
  • While not solution is found:
    • Select the course with less constrains (that is, has less uncompatibilities with other courses)
    • Run the AC-3 algorithm to reduce search space

I've tried this approach with sudoku solving and it worked (solved the hardest sudoku in the world in less than 10 seconds)