0

I have generated a sample dataset of 1000 people with their available timeslots during the day. Each timeslot is a 30 minute interval during the day. 0 indicates they are free during that timeslot and 1 indicates they are busy.

for example:

| Time        | Sally | Mark | Nish |
| ------------| ----- | ---- | ---- |
| 0900 - 0930 |   0   |  1   |   1  |
| 0930 - 1000 |   1   |  0   |   1  |
| 1000 - 1030 |   1   |  1   |   1  |
| 1030 - 1100 |   1   |  0   |   1  |
| 1100 - 1130 |   0   |  1   |   1  |
| 1200 - 1230 |   1   |  0   |   0  |

I want to create the maximum number of groups of 5 people that have at least one available timeslot in common. Each group should be mutually exclusive. I want to maximize the number of successful groups that are created.

At present, I am using a pretty crude algorithm. I sample the dataset for 5 people then check if they have an available timeslot in common. If they do, then I remove them from the dataset and repeat the process. If they do not have a common timeslot available, I resample another 5 people and keep trying until I find a sample of 5 with a common timeslot. If after a 1000 resamples, it is unable to find a sample of 5 that meets the criteria it stops.

This seems very inefficient to me and I was wondering if there is a better way to do this.

aryann
  • 3
  • 2
  • Please elaborate on what you mean by "maximum number of groups". What is the interaction among those groups? Do you need a set of mutually exclusive groups? – Prune May 04 '21 at 20:49
  • Hi, yes the groups need to be mutually exclusive. By maximum number of groups I was trying to convey that I am measuring the utility of the algorithm in terms of the total number of people who are successfully put in a group with a common timeslot – aryann May 04 '21 at 20:52
  • If I understand what you are doing, might it be sub-optimal? Might grouping, say, a,b,c,d,and e and taking them out them elliminate, say two other possible groupings where you needed c for one and e for the other? – ViennaMike May 05 '21 at 02:05
  • Also, is this helpful? The accepted answer looks relevant: Also, I'm not sure, but this Q&A may be helpful: https://stackoverflow.com/questions/2697183/quickest-algorithm-for-finding-sets-with-high-intersection – ViennaMike May 05 '21 at 02:12

2 Answers2

1

I would tackle the task according to the following pseudo code:

For every timeslot count the number of available people.
Sort the timeslot in ascending order of available people.

For every person, count the number of available timeslots.
Sort the people in ascending order of available timeslots.

While people are available:
    Enumerate the sorted list of timeslots:
        Repeat for the timeslot as long as 5+ people are available
            Collect five people available for this timeslot
            Remove them as one group from the list of people.
            Decrease the avaiblability count of the timeslot. 
    Break the while loop if no group was formed during the iteration

The rationale behind the sorting is to use rare timeslots first. Assign people with few choices first. This leaves the crowded timeslots and the people with many choices for the final rounds and thus enlarges the chance for more groups.

Axel Kemper
  • 10,544
  • 2
  • 31
  • 54
  • This is a typical "greedy" algorithm -- and a great approach if you don't have to have a provably optimal solution -- but only one that's near optimal for any reasonably distributed groups. – Prune May 04 '21 at 21:12
  • Thanks for this. Where can I account for the sorting of people in ascending order of available timeslots. – aryann May 04 '21 at 21:17
  • You could define a class `Person` which has a list of available timeslots as property. `Person` should have a constructor method to create the object from your data row. Then define a class `Timeslot` which has a list of available `Person` objects as property. Finally, define a class `PersonGroup` which lists the grouped `Person` objects. How to create and sort a list of objects is described in Python textbooks and tutorials. – Axel Kemper May 04 '21 at 21:32
0

Yes, your Monte Carlo solution is very time-consuming. Depending on the density of available slots, it may be fatally flawed.

Instead, construct viable sets. For each slot, build a set of all people available. Now, all you need to do is iterate through all of the 5-member subsets of that set. Repeat this for each time slot. You now have all sets of five members with at least one time slot in common.

Prune
  • 76,765
  • 14
  • 60
  • 81