Portfolio Selection in Python with constraints from a fixed set

Question

I am working on a project where I am trying to select the optimal subset of players from a set of 125 players (example below)

The constraints are:

a) Number of players = 3

b) Sum of prices <= 30

The optimization function is Max(Sum of Votes)

        Player  Vote  Price
  William Smith  0.67    8.6
Robert Thompson  0.31    6.7
Joseph Robinson  0.61    6.2
Richard Johnson  0.88    4.3
   Richard Hall  0.28    9.7

I looked at the scipy optimize package but I can't find anywhere a way to constraint the universe to this subset. Can anyone point me if there is a library that would do that? Thanks

It sounds like you're trying to do simulated annealing, for which there is a non-scipy package located here: https://github.com/perrygeo/simanneal. Alternatively, the basinhopping function in scipy.optimize might work, as it is meant to replace the scipy.optimize.anneal, but I haven't used it personally and don't presently have time to try it out. — unsupervised_learner, May 17 '17 at 02:28

Tristan · Answer 1 · 2017-05-17T14:27:34.023

The problem is well suited to be formulated as mathematical program and can be solved with different Optimization libraries.

It is known as the exact k-item knapsack problem.

You can use the Package PuLP for example. It has interfaces to different optimization software packages, but comes bundled with a free solver.

easy_install pulp

Free solvers are often way slower than commercial ones, but I think PuLP should be able to solve reasonably large versions of your problem with its standard solver.

Your problem can be solved with PuLP as follows:

from pulp import *

# Data input
players = ["William Smith", "Robert Thompson", "Joseph Robinson", "Richard Johnson", "Richard Hall"]
vote = [0.67, 0.31, 0.61, 0.88, 0.28]
price = [8.6, 6.7, 6.2, 4.3, 9.7]

P = range(len(players))

# Declare problem instance, maximization problem
prob = LpProblem("Portfolio", LpMaximize)

# Declare decision variable x, which is 1 if a
# player is part of the portfolio and 0 else
x = LpVariable.matrix("x", list(P), 0, 1, LpInteger)

# Objective function -> Maximize votes
prob += sum(vote[p] * x[p] for p in P)

# Constraint definition
prob += sum(x[p] for p in P) == 3
prob += sum(price[p] * x[p] for p in P) <= 30

# Start solving the problem instance
prob.solve()

# Extract solution
portfolio = [players[p] for p in P if x[p].varValue]
print(portfolio)

The runtime to draw 3 players from 125 with the same random data as used by Brad Solomon is 0.5 seconds on my machine.

Wow thanks a lot Tristan - it reminded me that I actually had a class on the knapsack problem a while back...perhaps I should have paid more attention :) Your solution worked perfectly on my machine, thanks vm again — Karimb, May 17 '17 at 23:07

score 1 · Answer 2 · edited May 23 '17 at 11:54

Your problem is discrete optimization task because of a) constraint. You should introduce discrete variables to represent taken/not taken players. Consider the following Minizinc pseudocode:

array[players_num] of var bool: taken_players;
array[players_num] of float: votes;
array[players_num] of float: prices;

constraint sum (taken_players * prices) <= 30;
constraint sum (taken_players) = 3;

solve maximize sum (taken_players * votes);

As far as I know, you can't use scipy to solve such problems (e.g. this).

You can solve your problem in these ways:

You can generate Minizinc problem in Python and solve it by calling external solver. It seems to be more scalable and robust.
You can use simulated annealing
Mixed integer approach

The second option seems to be simpler for you. But, personally, I prefer the first one: it allows you introducing a wide range of various constraints, problem formulation feels more natural and clear.

Brad Solomon · Answer 3 · 2017-05-17T03:32:34.723

@CaptainTrunky is correct, scipy.minimize will not work here.

Here is an awfully crappy workaround using itertools, please ignore if one of the other methods has worked. Consider that to draw 3 players from 125 creates 317,750 combinations, n!/((n - k)! * k!). Runtime on the main loop ~ 6m.

from itertools import combinations

df = DataFrame({'Player' : np.arange(0, 125),
                'Vote' : 10 * np.random.random(125),
                'Price' : np.random.randint(1, 10, 125)
                })

df
Out[109]: 
     Player  Price     Vote
0         0      4  7.52425
1         1      6  3.62207
2         2      9  4.69236
3         3      4  5.24461
4         4      4  5.41303
..      ...    ...      ...
120     120      9  8.48551
121     121      8  9.95126
122     122      8  6.29137
123     123      8  1.07988
124     124      4  2.02374

players = df.Player.values
idx = pd.MultiIndex.from_tuples([i for i in combinations(players, 3)])

votes = []
prices = []

for i in combinations(players, 3):
    vote = df[df.Player.isin(i)].sum()['Vote']
    price = df[df.Player.isin(i)].sum()['Price']
    votes.append(vote); prices.append(price)

result = DataFrame({'Price' : prices, 'Vote' : votes}, index=idx)

# The index below is (first player, second player, third player)

result[result.Price <= 30].sort_values('Vote', ascending=False)
Out[128]: 
           Price      Vote
63 87 121   25.0  29.75051
   64 121   20.0  29.62626
64 87 121   19.0  29.61032
63 64 87    20.0  29.56665
   65 121   24.0  29.54248
         ...       ...
18 22 78    12.0   1.06352
   23 103   20.0   1.02450
22 23 103   20.0   1.00835
18 22 103   15.0   0.98461
      23    14.0   0.98372

Portfolio Selection in Python with constraints from a fixed set

3 Answers3