1

I have a Pandas dataframe with 3 columns (Product, Weight, Total Cost) as follows (expanded to make it clearer):

df = {
    'Product': ['Product 1', 'Product 2', 'Product 3', 'Product 4',
                'Product 1', 'Product 2', 'Product 3', 'Product 4',
                'Product 1', 'Product 2', 'Product 3', 'Product 4',
                'Product 1', 'Product 2', 'Product 3', 'Product 4',
                'Product 1', 'Product 2', 'Product 3', 'Product 4',
                'Product 1', 'Product 2', 'Product 3', 'Product 4',
                'Product 1', 'Product 2', 'Product 3', 'Product 4',
                'Product 1', 'Product 2', 'Product 3', 'Product 4'],
    'Pack Size': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8],
    'Total Cost': [2773.47, 2665.23, 23421.24, 17666.58, 1592.09, 1678.04, 12798.46, 9425.80, 1246.30, 1396.98,
                   9377.70, 6621.21, 1109.40, 1292.45, 7491.82, 5275.17, 1056.07, 1258.53, 6482.69, 4517.04,
                   1044.51, 1259.92, 5845.94, 4035.62, 1056.82, 1281.48, 5411.69, 3712.32, 1084.06, 1315.65,
                   5104.00, 3487.85]
}

The dataframe holds the Total Cost of each potential Weight of each Product. Every Product and Weight combination have a different cost so if there's 1000 Product and Weight combinations, there are 1000 different Total Costs.

I need to determine which 2 Weights (which must be selected from 'Weight' in df) I should choose in order to achieve the lowest Total Cost across all products (referring to the total cost for that product and weight combination).

Every product must have 1 weight assigned to it and it can only have one of these 2 weights.

The output should look as follows:

Product 1: Weight=3
Product 2: Weight=3
Product 3: Weight=3
Product 4: Weight=5

As you can see every product has a weight, and in total there are only 2 weights selected across all products, 3 and 5.

Another way to demonstrate it is to display it this way (assume there were 10 products)

Weight 6: Product 1, Product 4, Product 5
Weight 18: Product 2, Product 3, Product 6, Product 8, Product 9, Product 7, Product 10

Again, only 2 Weights selected across all products (6 and 18) and every product is assigned to one of them.

The objective is to determine which weight should be chosen for each product (and its respective total cost) in order to minimise the total cost across all products.

The dataframe will always be populated with different amounts of Products (and therefore rows), potentially over 100 Products.

There will always be 48 Weights per Product on the dataframe which are 1 ... 48 (therefore if there were 100 products, there would be 4800 rows)

Hoping someone could help me with this one. Thanks!

I've tried a variety of approaches but I don't know how to approach this with a dataframe.

The solution provided by Sebastian Wozny using pulp was just about what I needed.

Reinderien
  • 11,755
  • 5
  • 49
  • 77
t24opb
  • 21
  • 2
  • 1
    Can you provide the expected output for your sample and explain why please? – Corralien Jul 09 '23 at 12:44
  • I have a local solution that assigns weights (6, 6, 8, 8) to products (1, 2, 3, 4) for a total cost of $10896.28. I assume that when you showed weights (3, 3, 3, 5) it was only to show the structure of the output and not the optimal solution for these data. – Reinderien Jul 10 '23 at 14:09
  • 1
    Yes that’s right, just to show the structure of the output. – t24opb Jul 10 '23 at 14:15
  • Before I forget or lose its context, [this gist](https://gist.github.com/reinderien/835a07405c0e00445ea66bd30976fc3f) contains my current interpretation of the problem. If this question is reopened I'll convert it into an answer. – Reinderien Jul 10 '23 at 14:57
  • Can you post your code so far. – Rohit Gupta Jul 11 '23 at 05:24
  • I voted to reopen, we need one more person (review queue so slow) – Sebastian Wozny Jul 11 '23 at 12:46

3 Answers3

0

Below solution uses the groupby and nsmallest methods from the pandas library to find the two smallest Total Costs for each Product, then selects the two Weights with the lowest Total Cost overall.

First, you’ll need to convert the ‘Total Cost’ column to a numeric type, since it’s currently stored as strings. You can do this using the pd.to_numeric method.

Use the groupby method to group the rows by Product and find the two smallest Total Costs for each Product using the nsmallest method.

df['Total Cost'] = pd.to_numeric(df['Total Cost'].str.replace(',', ''), errors='coerce')

grouped = df.groupby('Product')['Total Cost'].nsmallest(2)
grouped = grouped.reset_index(level=0).reset_index(drop=True)

result = grouped.nsmallest(2, columns='Total Cost')

print(result)

above code will return a DataFrame with two rows showing the Product and Weight assigned to each of the two selected Weights.

Output: output shows that the two selected Weights are both assigned to Product 1 and have Total Costs of 1246.30 and 1592.09, respectively.

  Products      Total Cost
  Product 1     1246.30
  Product 1     1592.09
Karim Baidar
  • 677
  • 3
  • 10
0

Sort your dataframe first then pivot it (without sort). The right answer is on the diagonal

# Convert Total Cost as numeric
df['Total Cost'] = pd.to_numeric(df['Total Cost'].str.replace(',', ''))

out = (df.sort_values('Total Cost')
         .pivot_table(index='Product', columns='Weight', 
                      values='Total Cost', sort=False))

Output:

>>> out
Weight           3         2         1
Product                               
Product 1  1246.30   1592.09   2773.47
Product 2  1396.98   1678.04   2665.23
Product 4      NaN   9425.80  17666.58
Product 3      NaN  12798.46  23421.24

Obviously it works because you have only 2 variables.

To extract the result, you can do:

data = list(zip(out.index, out.columns, np.diag(out)))[:2]
sol = pd.DataFrame(data, columns=df.columns)
print(sol)

# Output
     Product  Weight  Total Cost
0  Product 1       3     1246.30
1  Product 2       2     1678.04
Corralien
  • 109,409
  • 8
  • 28
  • 52
0

The problem description lends itself to solving with a Mixed-Integer-Linear Program (MIP).

A convenient library for solving mixed integer problems is PuLP that ships with the built-in Coin-OR suite and in particular the integer solver CBC.

We formulate a model that describes your problem:

import pandas as pd
from pulp import LpProblem, LpVariable, lpSum, LpMinimize

df = pd.DataFrame({
    'Product': ['Product 1', 'Product 2', 'Product 3', 'Product 4', 'Product 1', 'Product 2', 'Product 3', 'Product 4', 'Product 1', 'Product 2'],
    'Weight': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3],
    'Total Cost': ['2,773.47', '2,665.23', '23,421.24', '17,666.58', '1,592.09', '1,678.04', '12,798.46', '9,425.80', '1,246.30', '1,396.98']
})

#Fix the total cost to numeric
df['Total Cost'] = pd.to_numeric(df['Total Cost'].str.replace(',', ''), errors='coerce')

# Decision variable for each weight product combnination
lp_vars = {}

# Cost coefficient for each product weight combination
costs = {}

for _, row in df.iterrows():
    product = row['Product']
    weight = row['Weight']
    var_name = f"{product},{weight}"  
    lp_vars[(product, weight)] = LpVariable(var_name, cat='Binary') 
    costs[(product, weight)] = row['Total Cost'] 
problem = LpProblem("Total_Cost_Minimization", LpMinimize)


#Minimize the total cost of all selected product,weight combinations
problem += lpSum([costs[(product,weight)] * lp_vars[(product,weight)] for product,weight in lp_vars.keys()])

#We're only allowed to overall select 2 weights
problem += lpSum([lp_var for lp_var in lp_vars.values()]) == 2

#We're only allowed to select up to 1 weight per product
for product in set(df['Product']):
    problem += lpSum([lp_vars[product,weight]  for weight in df.query(f'Product == "{product}"')['Weight'].tolist()]) <= 1

problem.solve()



To visualize the solution:

from pulp import LpStatus

print("Status:", LpStatus[problem.status])
print("Optimal Total Cost:", problem.objective.value())
print("Selected Weights and Assigned Products:")
for lp_var in problem.variables():
    if lp_var.varValue == 1:
        print(lp_var.name)
        product, weight = lp_var.name.split(",")
        print(f"Product: {product}, Weight: {weight}")

Output

Status: Optimal
Optimal Total Cost: 2643.2799999999997
Selected Weights and Assigned Products:
Product_1,3
Product: Product_1, Weight: 3
Product_2,3
Product: Product_2, Weight: 3
Sebastian Wozny
  • 16,943
  • 7
  • 52
  • 69