Solving Knapsack Problem for large dataset using PULP library in Python

Question

I have the below dataframe:

import itertools
import pandas as pd
import numpy as np

p1_values = np.arange(19, 29.1, 0.1)
p2_values = np.arange(23, 33.1, 0.1)
p3_values = np.arange(36, 46.1, 0.1)

p1_values = np.arange(19, 29.1, 0.1)
p2_values = np.arange(23, 33.1, 0.1)
p3_values = np.arange(36, 46.1, 0.1)

df = pd.DataFrame(list(itertools.product(p1_values, p2_values, p3_values)), columns=['P1', 'P2', 'P3'])
df['P1_Rev'] = 150000 - ((150000-25000)/(29-19))*(df['P1']-19)
df['P2_Rev'] = 150000 - ((150000-25000)/(33-23))*(df['P2']-23)
df['P3_Rev'] = 150000 - ((150000-25000)/(46-36))*(df['P3']-36)
df['P1_GM'] = 50000 - ((50000-5000)/(29-19))*(df['P1']-19)
df['P2_GM'] = 50000 - ((50000-5000)/(33-23))*(df['P2']-23)
df['P3_GM'] = 50000 - ((50000-5000)/(46-36))*(df['P3']-36)
df['P1_RM'] = 50000 - ((50000-5000)/(29-19))*(df['P1']-19)
df['P2_RM'] = 50000 - ((50000-5000)/(33-23))*(df['P2']-23)
df['P3_RM'] = 50000 - ((50000-5000)/(46-36))*(df['P3']-36)

def p1_promo(x):
    if x <= 20:
        return "HP"
    elif x <= 26:
        return "Deep"
    else:
        return "Base"

def p2_promo(x):
    if x <= 24:
        return "HP"
    elif x <= 28:
        return "Deep"
    else:
        return "Base"

def p3_promo(x):
    if x <= 36:
        return "HP"
    elif x <= 40:
        return "Deep"
    else:
        return "Base"

df['P1_Promo'] = df['P1'].apply(p1_promo)
df['P2_Promo'] = df['P2'].apply(p2_promo)
df['P3_Promo'] = df['P3'].apply(p3_promo)
df['Total_GM']=df['P1_GM']+df['P2_GM']+df['P3_GM']
df['Total_RM']=df['P1_RM']+df['P2_RM']+df['P3_RM']
df['Total_Rev']=df['P1_Rev']+df['P2_Rev']+df['P3_Rev']

Using the above dataframe I want to get a final dataframe with 52 rows where it meets the below criteria:

There should be at least 37 rows where P1_Promo = Base
There should be at least 5 rows where P1_Promo = HP
There should be at least 10 rows where P1_Promo = Deep
There should be at least 40 rows where P2_Promo = Base
There should be at least 5 rows where P2_Promo = HP
There should be at least 7 rows where P2_Promo = Deep
There should be at least 37 rows where P3_Promo = Base
There should be at least 7 rows where P3_Promo = HP
There should be at least 8 rows where P3_Promo = Deep
The Total of Total_GM should be maximum
The Total of Total_RM should be maximum
The Total of Total_Rev should be maximum

I have tried solving this using LpProblem from PULP library in python. It does give me an optimal solution - but it takes a lot of time. I want to see if there are other methods that can solve this using Python (potentially for larger datasets like 10 mil rows). Appreciate your help.

score 0 · Answer 1 · answered Mar 15 '23 at 20:22

I think a MIP model would need a binary variable for each row. That would mean a model with 10 million binary variables. This is large. However, the structure is simple, so it may be doable with a good MIP solver. It is very difficult to predict these things, so you just have to try it out.

There is one more conceptual issue.

The Total of Total_GM should be maximum
The Total of Total_RM should be maximum
The Total of Total_Rev should be maximum

means you have a multi-objective problem. One way to make a normal optimization out of this, is to form a weighted sum and use that as single objective. That requires some thought about these weights.

Solving Knapsack Problem for large dataset using PULP library in Python

1 Answers1