3

Say I need to place n=30 students into groups of between 2 and 6, and I collect the following preference data from each student:

Student Name: Tom

Likes to sit with: Jimi, Eric

Doesn't like to sit with: John, Paul, Ringo, George

It's implied that they're neutral about any other student in the overall class that they haven't mentioned.

How might I best run a large number of simulations of many different/random grouping arrangements, to be able to determine a score for each arrangement, through which I could then pick the "most optimal" score/arrangement?

Alternatively, are there any other methods by which I might be able to calculate a solution that satisfies all of the supplied constraints?

I'd like a generic method that can be reused on different class sizes each year, but within each simulation run, the following constants and variables apply:

Constants: Total number of students, Student preferences

Variables: Group sizes, Student Groupings, Number of different group arrangements/iterations to test

Thanks in advance for any help/advice/pointers provided.

sascha
  • 32,238
  • 6
  • 68
  • 110
FrugalTPH
  • 533
  • 2
  • 6
  • 25
  • Perhaps you could look for "feature selection algorithms." – rajah9 Jan 05 '20 at 12:20
  • I did wonder about using some kind of random / semi-random mutation & selection algorithm. I saw something years ago that Richard Dawkins made to match literary quotations in circa 20-30 iterations vs not converging on matches at all using totally random guesses (monkeys on type-writers approach). – FrugalTPH Jan 05 '20 at 12:25
  • This is an NP-HARD problem. You won't find optimal solutions for all input sizes. What you can try is to either run a greedy algorithm multiple times, or use a genetic algorithm or some other random optimization method for trying to find optimal values. – Omri374 Jan 05 '20 at 12:25
  • 2
    If this is more about feasibility (more or less a high chance that there is a valid solution treating those constraints hard), a SAT-solver will be very hard to beat (giving a good formulation including symmetry-breaking). If chances are slim and this is about min-violation, things might slowly shift towards other techniques like integer-programming (potentially quadratic) or (meta-)heuristics. Monte-carlo like simulation like described is a good basic algorithm, but it should also include local-search and potentially large neighborhood search. And as mentioned: i expect it to be NP-hard too. – sascha Jan 05 '20 at 12:25
  • @maytham-ɯɐɥʇʎɐɯ That looks interesting, but wouldn't it be multipartite graph, seeing as I'm trying to form small groups rather than one to one pairings? – FrugalTPH Jan 05 '20 at 12:26
  • I could potentially inflict penalties on the "doesn't like to sit with" constraint, ignoring the "likes to sit with", and then do monte-carlo to find least-penalty solutions. It doesn't have to be THE OPTIMAL solution, just the best of however many iterations I run. – FrugalTPH Jan 05 '20 at 12:29
  • @sacha I'll look into that SAT-solver stuff. Local search sounds interesting too as I presume there will be small micro-groupings that work well together which would be the more successful traits / features inherited in genetic selection type algorithms. – FrugalTPH Jan 05 '20 at 12:32
  • 2
    Monte-carlo is trivially implemented and can be used for many variants of this problem. But it will not compete with more advanced / more exploiting approaches. If it's enough for you, use it. So probably also start with it: from simple to more complex. E.g. monte-carlo + local-search as *polishing* ~ multi-start search(+ a few iterations of lns; but that's usually harder to implement, often using some black-box solver like SAT or CP) You can also start looking into the keyword *wedding seating problem*. There are different variants and lots of structure is shared. – sascha Jan 05 '20 at 12:33
  • @sascha Will given it a go. Do you know of any useful helper libraries for such algorithms? I programme in C# & Javascript mostly (visual studio & C# would be my goto for small programmes like this). – FrugalTPH Jan 05 '20 at 12:37
  • 1
    No one asked so far about the topology of the class room. If there are multiple rows, solutions might differ (if in front and behind and diagonally in front and behind do not count into the restrictions). I have the feeling topology matters. Maybe even to the class of the problem. – BitTickler Jan 05 '20 at 12:40
  • No, it's too early to decide on libraries. Do some monte-carlo experiments first and decide on some more formal treatment (e.g. relaxation / min-violation). I would not do much more before that. In regards to SAT, MIP, CP or most of the discrete-optimization software, the *good* solvers are all C,C++,Fortran. So youl will need some wrapper, **if** going that route. No pure C#/Javascript, i would expect. – sascha Jan 05 '20 at 12:41
  • @BitTickler Yeah I didn't mention classroom layout / geometry as I didn't want to over-complicate things for now. It would totally change things from a grouping problem to a proximity problem. For now I've assumed that student psychology of "I don't want to be in the same group as x" will be the driving factor. :S – FrugalTPH Jan 05 '20 at 12:43
  • @sascha How can that be? Imagine a class room with one vertical row of chairs - no one sits next to anyone. Or imagine a class room with only one horizontal row of chairs - everyone except the outside has 2 neighbors... Imagine a 3 dimensional class room with chairs arranged along the grid of a cube... Now everyone has 4 or 8 neighbors, depending on the counting rules... Does proximity matter or only direct neighbors?... – BitTickler Jan 05 '20 at 12:45
  • And he we are going towards the first problem: let's assume you go for SAT-based approaches. Likely unbeatable in the feasibility problem. Then you change your problem for proximity-optimization instead the more simple conflict-graph. Now, SAT will struggle and all previous work is lost. This happens in discrete-optimization all the time, when doing not enough theory and formal treatment first (including instance statistics), as NP-hardness is all about heuristics / exploitation of those *problem-dependent structures*. – sascha Jan 05 '20 at 12:45
  • I think just getting the groups as good as can be will be enough for now, and then the teacher will be able to position each person on each table manually, as that will be much easier for them to do once they are working with groups between 2-6 rather than 30+. I.e. I don't think proximity is a relevant consideration for the OP. – FrugalTPH Jan 05 '20 at 12:49
  • 1
    I muse, the problem might actually become algorithmically easier if you do consider topology. A Kohonen network comes to mind as a fancy way to sort it out. Likers attract each other, dislikers repulse each other... on a 2D plane... then snap to grid. – BitTickler Jan 05 '20 at 12:50
  • @BitTickler Oh actually that does sound an interesting way of looking at it. I hadn't thought of that. How would I go about modelling such? – FrugalTPH Jan 05 '20 at 12:53
  • What about non bilateral relationships? If A likes B but B hates A? This kind of is a problem for my approach right now - A runs after B and B runs away lol. Everyone runs from the group of class bullies and huddle in the corner... its kind of funny... not sure if I should present it as an answer though. – BitTickler Jan 05 '20 at 17:58
  • @BitTickler Yeah I might have to have a small amount of pre-processing to negate foibles like that. There's also some input data saying "Doesn't like to sit with: Boys (urgh)", which will need similar expansion. :D – FrugalTPH Jan 05 '20 at 18:01
  • 1
    Some real-world raw data as an example can be found here (matrix in xlsx form): https://www.dropbox.com/s/nz9qj0z0fyve290/class1_rawData.xlsx?dl=0 – FrugalTPH Jan 05 '20 at 19:45

3 Answers3

5

I believe you can state this as an explicit mathematical optimization problem.

Define the binary decision variables:

x(p,g) = 1 if person p is assigned to group g
         0 otherwise

I used:

enter image description here

I used your data set with 28 persons, and your preference matrix (with -1,+1,0 elements). For groups, I used 4 groups of 6 and 1 group of 4. A solution can look like:

----     80 PARAMETER solution  using MIQP model

               group1      group2      group3      group4      group5

aimee               1
amber-la                                                1
amber-le                                                            1
andrina             1
catelyn-t                                   1
charlie                                                 1
charlotte                                   1
cory                            1
daniel                          1
ellie               1
ellis               1
eve                                         1
grace-c                                                 1
grace-g                                                 1
holly                                                   1
jack                            1
jade                                                                1
james                           1
kadie                                       1
kieran                                                              1
kristiana                                   1
lily                                                                1
luke                            1
naz                 1
nibah                                       1
niko                            1
wiki                1
zeina                                                   1
COUNT               6           6           6           6           4

Notes:

  • This model can be linearized, so it can be fed into a standard MIP solver
  • I solved this directly as a MIQP model (actually the solver reformulated the model into a MIP). The model solved in a few seconds.
  • Probably we need to add extra logic to make sure one person is not getting a really bad assignment. We optimize here only the total sum. This overall sum may allow an individual to get a bad deal. It is an interesting exercise to take this into account in the model. There are some interesting trade-offs.
Erwin Kalvelagen
  • 15,677
  • 2
  • 14
  • 39
  • Thanks for this, I'll have a play with the method and see what tuning / tweaks it might need to overcome some of the downfalls you mentioned. – FrugalTPH Jan 13 '20 at 10:55
1

1st approach should be, create matrix n x n where n is total number of students, indexes for row and columns are ordinals for every student, and each column representing preferences for sitting with the others students. Fills the cells with values 1=Like to sit, -1 = the Opposite, 0 = neutral. Zeroes to be filled too on main diagonal (i,i)

------Mark Maria John Peter

Mark 0 1 -1 1

Maria 0 0 -1 1

John -1 1 0 1

Peter 0

Score calculations are based on sums of these values. So ie: John likes to sit with Maria, = 1, but Maria doesn't like to sit with John -1, result is 0. Best result is when both score (sum) 2.

So on, based on Group Sizes, calculate Score of each posible combination. Bigger the score, better the arrangement. Combinations discriminate values on main diagonal. ie: John grouped with the same John is not a valid combination/group.

In a group size of 2, best score is 2

In a group size of 3, best score is 6,

In a group size of 4, best score is 12

In a group size of n, best score would be (n-1)*n

Now in ordered list of combinations / groups, you should take first the best tuples with highest scores, but avoiding duplicates of students between tuples.

D3V3X
  • 161
  • 1
  • 4
  • Thanks for the suggestion, I'll have a play with this amongst other options later when I'm in front of the input date. Thanks again. – FrugalTPH Jan 05 '20 at 14:01
0

In a recent research, a PSO was implemented to classify students under unknown number of groups of 4 to 6. PSO showed improved capabilities compared to GA. I think that all you need is the specific research.

The paper is: Forming automatic groups of learners using particle swarm optimization for applications of differentiated instruction

You can find the paper here: https://doi.org/10.1002/cae.22191

Perhaps the researchers could guide you through researchgate: https://www.researchgate.net/publication/338078753

Regarding the optimal sitting you need to specify an objective function with the specific data

Gus Rdrm
  • 21
  • 4