2

I have a set of N items that I want to split in K subsets of size n1, n2, ..., nk (with n1 + n2 + ... + nk = N)

I also have constraints on which item can belong to which subset.

For my problem, at least one solution always exist.

I'm looking to implement an algorithm in Python to generate (at least) one solution.

Exemple :

Possibilities :

Item\Subset 0 1 2
A True True False
B True True True
C False False True
D True True True
E True False False
F True True True
G False False True
H True True True
I True True False

Sizes constraints : (3, 3, 3)

Possible solution : [0, 0, 2, 1, 0, 1, 2, 2, 1]

Implementation :

So far, I have tried brute force with success, but I now want to find a more optimized algorithm.

I was thinking about backtracking, but I'm not sure it is the right method, nor if my implementation is right :

import pandas as pd
import numpy as np
import string

def solve(possibilities, constraints_sizes):
    solution = [None] * len(possibilities)

    def extend_solution(position):
        possible_subsets = [index for index, value in possibilities.iloc[position].iteritems() if value]
        for subset in possible_subsets:
            solution[position] = subset
            unique, counts = np.unique([a for a in solution if a is not None], return_counts=True)
            if all(length <= constraints_sizes[sub] for sub, length in zip(unique, counts)):
                if position >= len(possibilities)-1 or extend_solution(position+1):
                    return solution
        return None

    return extend_solution(0)


if __name__ == '__main__':

    constraints_sizes = [5, 5, 6]
    
    possibilities = pd.DataFrame([[False, True, False],
                                  [True, True, True],
                                  [True, True, True],
                                  [True, True, True],
                                  [True, False, False],
                                  [True, True, True],
                                  [True, True, True],
                                  [True, True, True],
                                  [True, False, False],
                                  [True, True, True],
                                  [True, True, True],
                                  [True, True, True],
                                  [False, True, True],
                                  [True, True, True],
                                  [True, True, True],
                                  [True, False, False]],
                                 index=list(string.ascii_lowercase[:16]))
    
    solution = solve(possibilities, constraints_sizes)

One possible expected solution : [1, 0, 0, 1, 0, 1, 1, 1, 0, 2, 2, 2, 2, 2, 2, 0]

Unfortunately, this code fails to find a solution (eventhough it works with the previous example).

What am I missing ?

Thank you very much.

Betcha
  • 77
  • 5

2 Answers2

1

This problem can be solved by setting up a bipartite flow network with Items on one side, Subsets on the other, a surplus of 1 at each Item, a deficit of (Subset's size) at each Subset, and arcs of capacity 1 from each Item to each Subset to which it can belong. Then you need a maximum flow on this network; OR-Tools can do this, but you have a lot of options.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
  • Thank you for your reply. The tool and the concept that you mention is very unfamiliar to me. I was hoping to find a simple algorithm that I can implement with simple Python standard library. I would also enjoy to understand whether my problem can't be solved with backtracking, or if my implementation is bad. Thanks again. – Betcha Dec 14 '21 at 17:09
  • @Betcha You can find an algorithm you can implement at https://en.wikipedia.org/wiki/Edmonds%E2%80%93Karp_algorithm. Conceptually what you're doing is starting by randomly putting items into buckets until you're stuck. Then you start doing breadth-first searches for ways to switch assignments around to get one more item in. The cool thing is you can always succeed in polynomial time. – btilly Dec 14 '21 at 18:21
  • 2
    @Betcha Backtracking is a similar idea but with depth first searches instead of breadth first. The problem is that if the decision you need to switch is your first assignment, you have to explore the ENTIRE search space before trying that. You "get stuck down an exponential rabbit hole" before making the next bit of progress. Breadth first avoids ever being stuck that way. – btilly Dec 14 '21 at 18:23
0

@David Eisenstat mentioned OR-Tools as a package to solve this kind of problem.

Thanks to him, I've found out that this problem could match one of their example, an Assignement with Task Sizes problem

It matches my understanding of the problem better than what I understood from the suggested "Flow network" concept, but I'd be happy to discuss about that.

Here is the solution I implemented, based on their example :

from ortools.sat.python import cp_model


def solve(possibilities, constraint_sizes):
    # Transform possibilities into costs (0 if possible, 1 otherwise)
    costs = [[int(not row[subset]) for row in possibilities] for subset in range(len(possibilities[0]))]
    
    num_subsets = len(costs)
    num_items = len(costs[0])
    
    model = cp_model.CpModel()
    
    # Variables
    x = {}
    for subset in range(num_subsets):
        for item in range(num_items):
            x[subset, item] = model.NewBoolVar(f'x[{subset},{item}]')

    # Constraints :
    # Each subset should should contain a given number of item
    for subset, size in zip(range(num_subsets), constraint_sizes):
        model.Add(sum(x[subset, item] for item in range(num_items)) <= size)
    
    # Each item is assigned to exactly one subset
    for item in range(num_items):
        model.Add(sum(x[subset, item] for subset in range(num_subsets)) == 1)
    
    # Objective
    objective_terms = []
    for subset in range(num_subsets):
        for item in range(num_items):
            objective_terms.append(costs[subset][item] * x[subset, item])
    model.Minimize(sum(objective_terms))
    
    # Solve
    solver = cp_model.CpSolver()
    status = solver.Solve(model)
    if status == cp_model.OPTIMAL or status == cp_model.FEASIBLE:
        solution = []
        for item in range(num_items):
            for subset in range(num_subsets):
                if solver.BooleanValue(x[subset, item]):
                    solution.append(subset)
        return solution
    return None

The trick here is to tranform the possibilities into costs (0 only if possible), and to optimize the total cost. An acceptable solution should then have a 0 total cost.

It gives a right solution, for the previous problem :

possibilities = [[False, True, False],
                 [True, True, True],
                 [True, True, True],
                 [True, True, True],
                 [True, False, False],
                 [True, True, True],
                 [True, True, True],
                 [True, True, True],
                 [True, False, False],
                 [True, True, True],
                 True, True, True],
                 [True, True, True],
                 [False, True, True],
                 [True, True, True],
                 [True, True, True],
                 [True, False, False]]

constraint_sizes = [5, 5, 6]

solution = solver(possibilities, constraint_sizes)

print(solution) # [1, 2, 1, 0, 0, 0, 2, 1, 0, 1, 2, 2, 2, 2, 1, 0]

I have now two more questions :

  • Can we transform the optimization objective (minimize the cost) into a hard constraint (cost should equal to 0) ? I guess it could lower the computing time.

  • How can I get other solutions and not only one ?

I am also still looking for a plain Python solution without any library...

Thank you

Betcha
  • 77
  • 5
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 18 '21 at 17:54
  • If you have a new question, it should go in a new post. SO doesn't work like a discussion board – camille Dec 18 '21 at 18:46