Generate closest teams based on employee schedules C#

Question

I am given a csv of employee schedules with columns:

employee ID, first last name, sunday schedule, monday schedule, ... , saturday schedule

1 week schedule for each employee. I've attached a screenshot of a portion of the csv file. The total file has around 300 rows.

I need to generate teams of 15 based on the employees' schedules (locations don't matter) so that the employees on each team have the closest schedules to each other. Pseudocode of what I have tried:

parse csv file into array of schedules (my own struct definition)
match employees who have the same exact schedule into teams (creates ~5 full sized teams, 20 - 25 half filled teams, leaves ~50 schedules who don't match with anyone)
for i = 1 to 14, for each member of teams of size i, find the team with the closest schedule (as a whole) and add the member to that team. Once a team reaches size 15, mark them as "done".

This worked somewhat but definitely did not give me the best teams. My question is does anyone know a better way to do this? Pseudocode or just a general idea will help, thanks.

EDIT: Here is an example of the formula of comparison The comparison is based on half hour blocks of difference between the agents schedules. Agent 25 has a score of 16 because he has a difference of 8 half hours with Agent 23 and 24. The team's total score is 32 based on everyone's scores added together.

Not all agents work 8 hour days, and many have different days off, which have the greatest effect on their "closeness" score. Also, a few agents have a different schedule on a certain day than their normal schedule. For example, one agent might work 7am - 3pm on mondays but work 8am - 4pm on tuesday - friday.

When the employees on each team have the smallest difference in their schedules with each other. — obl, Jun 07 '17 at 21:43
@mrogers: That's the real question here. Once you have the metric, you can start to optimize. — Stefan Steiger, Jun 07 '17 at 21:44
You need to define a lot better "best teams". For example, if creating a team A with the exact same schedules the left employees create a team B where the schedules are sepparated by a week, what should the algorithm do? Keep theses or try to find two groups where the difference is lower sacrificing the exactitude of the first one? — Gusman, Jun 07 '17 at 21:51
@Gusman Full size teams of 15 that have the exact same schedule are left alone, because they have a perfect score of 0. Ideally all the teams would be this way, where every employee on a certain team has the same exact schedule. — obl, Jun 07 '17 at 21:54
I think you didn't got my point. Suppose you have 30 persons to form two groups, you match team A to be 0 points but it leaves a team B with 100 points (or any extremely aberrant value), but if you replan the teams you can achieve two teams of 4 points, aren't those better in a global view of the set? That's what I mean, you need to define a lot better these types of cases. — Gusman, Jun 07 '17 at 21:56
@Gusman what I meant was that the very first step of the algorithm is to match the employees who have the same exact schedule into teams. If there is a team of size 15 with the same exact schedule, those are out of the equation for the rest of the algorithm. The point that you are making is part of my question. I'm not sure how to handle further optimization, so if you have any ideas that would help a lot. — obl, Jun 07 '17 at 22:00
You need to ask to the person who assigned you this task, don't suppose anything, I can consider that a better set (lower total score, higher individual scores) is better over a single perfect group and multiple aberrant groups (higher total score, lower individual score for one, higher individual score for multiple) but other person can consider the opposite. — Gusman, Jun 07 '17 at 22:02
@Gusman The person who assigned me this task just wants the best teams overall, I just incorrectly assumed that having some perfect groups and multiple aberrant groups would yield the best overall result. So the best overall result is having the lowest combined score of all the teams in the comparison method I posted in the end of my question. — obl, Jun 07 '17 at 22:06
Hmmm, this is just an idea, if all the employees have the same number of hours per day and their work days are consecutive (I see two days and same schedule), if you take the first work hour of the first day (sat 8AM) as a timespan, sort the employees by this timespan and take groups of 15 employees wouldn't that give the result you want? — Gusman, Jun 07 '17 at 22:16
@Gusman Sorry the example is misleading, not all the agents work 8 hour days, although most do. Some work 8:30 hours, and a few work 3 - 6 hour days. Also many have different days off. I will edit my question to include this info. — obl, Jun 07 '17 at 22:23
Then how does those values modify the scores? I mean, if you match a worker who works 3 hours and other who works 8 hours but with the same schedule how many points will give that? And if one day matches and other not how does that modify the score? You lack a ton of information, first you need to define those rules, and then after that you can write the code. Also, as a note, S.O. is not the right place for this kind of question S.O. is meant to solve concrete doubts, not to ask general questions. — Gusman, Jun 07 '17 at 22:27
The phrase that you want to search for is "simulated annealing". Assign everyone randomly. Keep randomly testing swaps, doing them if they are good, and possibly doing them if they are only somewhat bad. The odds of doing them if they are only somewhat bad go up if they aren't very bad, and down over time. Eventually you'll settle on a locally optimal solution which tends to be globally pretty good. — btilly, Jun 08 '17 at 00:40

score 1 · Answer 1 · edited Jun 18 '17 at 10:24

Can't say for sure about the schedules, but in string algorithms you can find an edit distance calculation. The idea is to define number of operations you need to perform to get one string from another. For example, distance between kitten and sitting is 3, 2 for substitutions and 1 for deletion. I think that you can define a metric between two employees' schedule in similar way.

Now, after you have a distance function, you may start a clusterization. The k-means algorithm may be a good start for you, but it's main disadvantage is that the number of groups is fixed initially. But I think that you can easily adjust general logic for your needs. After that, you may try some additional ways to cluster your data, but you really should start with your distance function, and then simply optimize it on your employee records.

score 1 · Accepted Answer · answered Jun 08 '17 at 05:12

Unless you find a method that gets you an exact best answer, I would add a hill-climbing phase at the end that repeatedly checks to see if swapping any pair of agents between teams would improve things, and swaps them if this is the case, only stopping when it has rechecked every pair of agents and there are no more improvements to be made.

I would do this for two reasons:

1) Such hill-climbing finds reasonably good solutions surprisingly often.

2) People are good at finding improvements like this. If you produce a computer-generated schedule and people can find simple improvements (perhaps because they notice they are often scheduled at the same time as somebody from another team) then you're going to look silly.

Thinking about (2) another way to find local improvements would be to look for cases where a small number of people from different teams are scheduled at the same time and see if you can swap them all onto the same team.

Generate closest teams based on employee schedules C#

2 Answers2