I have a Pandas dataframe with 3 columns (Product, Weight, Total Cost) as follows (expanded to make it clearer):
df = {
'Product': ['Product 1', 'Product 2', 'Product 3', 'Product 4',
'Product 1', 'Product 2', 'Product 3', 'Product 4',
'Product 1', 'Product 2', 'Product 3', 'Product 4',
'Product 1', 'Product 2', 'Product 3', 'Product 4',
'Product 1', 'Product 2', 'Product 3', 'Product 4',
'Product 1', 'Product 2', 'Product 3', 'Product 4',
'Product 1', 'Product 2', 'Product 3', 'Product 4',
'Product 1', 'Product 2', 'Product 3', 'Product 4'],
'Pack Size': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8],
'Total Cost': [2773.47, 2665.23, 23421.24, 17666.58, 1592.09, 1678.04, 12798.46, 9425.80, 1246.30, 1396.98,
9377.70, 6621.21, 1109.40, 1292.45, 7491.82, 5275.17, 1056.07, 1258.53, 6482.69, 4517.04,
1044.51, 1259.92, 5845.94, 4035.62, 1056.82, 1281.48, 5411.69, 3712.32, 1084.06, 1315.65,
5104.00, 3487.85]
}
The dataframe holds the Total Cost of each potential Weight of each Product. Every Product and Weight combination have a different cost so if there's 1000 Product and Weight combinations, there are 1000 different Total Costs.
I need to determine which 2 Weights (which must be selected from 'Weight' in df) I should choose in order to achieve the lowest Total Cost across all products (referring to the total cost for that product and weight combination).
Every product must have 1 weight assigned to it and it can only have one of these 2 weights.
The output should look as follows:
Product 1: Weight=3
Product 2: Weight=3
Product 3: Weight=3
Product 4: Weight=5
As you can see every product has a weight, and in total there are only 2 weights selected across all products, 3 and 5.
Another way to demonstrate it is to display it this way (assume there were 10 products)
Weight 6: Product 1, Product 4, Product 5
Weight 18: Product 2, Product 3, Product 6, Product 8, Product 9, Product 7, Product 10
Again, only 2 Weights selected across all products (6 and 18) and every product is assigned to one of them.
The objective is to determine which weight should be chosen for each product (and its respective total cost) in order to minimise the total cost across all products.
The dataframe will always be populated with different amounts of Products (and therefore rows), potentially over 100 Products.
There will always be 48 Weights per Product on the dataframe which are 1 ... 48 (therefore if there were 100 products, there would be 4800 rows)
Hoping someone could help me with this one. Thanks!
I've tried a variety of approaches but I don't know how to approach this with a dataframe.
The solution provided by Sebastian Wozny using pulp was just about what I needed.