The following data is given as df:
id | class | country | weigths |
---|---|---|---|
a | 1 | US | 20 |
b | 2 | US | 5 |
a | 2 | CH | 5 |
a | 1 | CH | 10 |
b | 1 | CH | 5 |
c | 1 | US | 10 |
b | 2 | GER | 15 |
a | 2 | CH | 5 |
c | 1 | US | 15 |
a | 1 | US | 10 |
The goal is to create an alternative allocation of the columns weight but keep the distribution of unique values in id, class and country. For example: 5 of 10 values -> 50% in column id are "a". An alternative solution for weights should keep this distribution of a = 50%. And all other distribution of each unique value in the first three columns.
For this I created the following code to get a dict with the distribution:
constraint_columns = ["id", "class", "country"]
constraints = {}
for column in constraint_columns:
constraints[column] = dict(zip(df.groupby([column]).sum().reset_index()[column],
df.groupby([column]).sum().reset_index()["weights"]))
The result looks as follows:
{'id': {'a': 50, 'b': 25, 'c': 25},
'class': {1: 70, 2: 30},
'country': {'CH': 25, 'GER': 15, 'US': 60}}
I then initiate the model, create the variables for the model to solve (weights) and create the constraints by looping through my constraints and map them with the variables:
model = cp_model.CpModel()
solver = cp_model.CpSolver()
count = 0
dict_weights = {}
for weight in range(len(df)):
dict_weights[count] = model.NewIntVar(0, 100, f"weight_{count}")
count += 1
weights_full = []
for weight in dict_weights:
weights_full.append(dict_weights[weight])
I give a 5% range where the distribution can be different:
for constraint in constraints:
for key in constraints[constraint]:
keys = df.loc[df[constraint] == key].index
model.Add(sum(list(map(dict_weights.get, keys))) >= int(constraints[constraint][key] * 1 - ((constraints[constraint][key] * 1) * 0.05)))
model.Add(sum(list(map(dict_weights.get, keys))) <= int(constraints[constraint][key] * 1 + ((constraints[constraint][key] * 1) * 0.05)))
I solve the model and everything works fine:
solver.parameters.cp_model_presolve = False # type: ignore
solver.parameters.max_time_in_seconds = 0.01 # type: ignore
solution_collector = VarArraySolutionCollector(weights_full)
solver.SolveWithSolutionCallback(model, solution_collector)
solution_collector.solution_list
Solution:
[0, 0, 0, 0, 8, 0, 15, 15, 23, 35]
In a next step I want to tell the model, that the result should consist out of a specific number of weights. For example: 3 - That would mean that 5 weight values should be 0 and only 3 are used to find a solution that fits the distribution. Right now it does not matter if there is a feasible solution or not.
Any ideas how to solve this?