0

I have this optimization problem where I am trying to maximize column z based on a unique value from column X, but also within a constraint that each of the unique values picked of X added up column of Y most be less than or equal to (in this example) 23.

For example, I have this sample data:

X  Y  Z
1  9  25    
1  7  20     
1  5  5
2  9  20 
2  7  10 
2  5  5    
3  9  10 
3  7  5
3  5  5

The result should look like this:

   X  Y  Z 

  1  9  25  
  2  9  20     
  3  5  5 

This is replica for Set up linear programming optimization in R using LpSolve? with solution but I need the same in python.

blehblehbleh
  • 154
  • 6
  • There is no column `3 5 5` in you example data, how does it get there in the example output? I also don't understand the constraint, the sentence is broken atm. – Graipher Jan 24 '18 at 14:11
  • Oops my bad .. so its like i need binary constraints of the form that the result chooses one row from each group (column X; maybe like multi choice knapsack )along with another constraint that sum(Y) for the rows selected mult be <=23 – blehblehbleh Jan 24 '18 at 14:26
  • what is your question here? Is it to find a `x,y` pair where the sum of `x+y<23`? – usernamenotfound Jan 24 '18 at 14:37
  • 2
    Not that hard to do. But show some preliminary attempts or research first (looks like: *write that code for me right now*)! – sascha Jan 24 '18 at 14:38
  • @Usernamenotfound No. He wants to maximize z by picking rows, while there can be only one row selected for each x and all corresponding y-vals of selected rows should be <= 23 summed up. – sascha Jan 24 '18 at 14:41
  • I am sorry i am not so proficient in python; I was trying to use Pulp but while specifying the objective function or constraint how do i specify the selected row as in I think writing model += pulp.lpSum(df['Z']) won't be sufficient – blehblehbleh Jan 24 '18 at 14:56
  • You obviously have to play by pulps rules. Read the docs. Pulp does not know pandas dataframes. The question is missing details. There are alternatives to pulp, but some of them are not easy to install. But this is not tackled / specified in the question at all. The real-world dimensions are also missing. This might have some effects on the modelling-tool and solver. – sascha Jan 24 '18 at 15:11
  • Context - For each item, we create a separate model that predicts value and one that predicts cost at different states. Each item can be in one of possible 11 states. Each state has an associated value/reward along with cost and revenue. I need to select one of the state for each item such that sum(value) is maximized and global ROI, i.e. sum(revenue)/sum(cost)>= target_roi. This is very similar to dummy problem I have mentioned above and have also implemented the solution in R using lpSolve. Now I need to implement the same in python and wasn't very clear about syntax. – blehblehbleh Jan 24 '18 at 16:31

1 Answers1

0

For those who would want some help to get started with pulp in python can refer to http://ojs.pythonpapers.org/index.php/tppm/article/view/111

Github repo- https://github.com/coin-or/pulp/tree/master/doc/KPyCon2009 could be handy as well.

Below is the code in python for the dummy problem asked

            import pandas as pd
            import pulp

            X=[1,1,1,2,2,2,3,3,3]
            Y=[9,7,5,9,7,5,9,7,5]
            Z=[25,20,5,20,10,5,10,5,5]

            df = pd.DataFrame({'X':X,'Y':Y,'Z':Z})
            allx = df['X'].unique()
            possible_values = [(w,b) for w in allx for b in range(1,4)]

            x = pulp.LpVariable.dicts('arr', (allx, range(1,4)),
                                        lowBound = 0,
                                        upBound = 1,
                                        cat = pulp.LpInteger)

            model = pulp.LpProblem("Optim", pulp.LpMaximize)
            model += sum([x[w][b]*df[df['X']==w].reset_index()['Z'][b-1] for (w,b) in possible_values])

            model += sum([x[w][b]*df[df['X']==w].reset_index()['Y'][b-1] for (w,b) in possible_values]) <= 23, \
                                        "Maximum_number_of_Y"

            for value in allx:
                model += sum([x[w][b] for (w,b) in possible_values if w==value])>=1

            for value in allx:
                model += sum([x[w][b] for (w,b) in possible_values if w==value])<=1

            ##View definition
            model

            model.solve()

            print("The choosen rows are out of a total of %s:"%len(possible_values))
            for v in model.variables():
                print v.name, "=", v.varValue

For solution in R

    d=data.frame(x=c(1,1,1,2,2,2,3,3,3),y=c(9,7,5,9,7,5,9,7,5),z=c(25,20,5,20,10,5,10,5,3))
            library(lpSolve)
            all.x <- unique(d$x)
            d[lp(direction = "max",
                 objective.in = d$z,
                 const.mat = rbind(outer(all.x, d$x, "=="), d$y),
                 const.dir = rep(c("==", "<="), c(length(all.x), 1)),
                 const.rhs = rep(c(1, 23), c(length(all.x), 1)),
                 all.bin = TRUE)$solution == 1,]
blehblehbleh
  • 154
  • 6