Optimization with Binary decision variable and simulated database

Question

I have a data frame of simulated observations and I am trying to figure an optimization to get the greatest amount of successes for a given level of risk. The issue is I have no clue how to program in the constraints. Here is a subset of my DF, Rows are teams and X1:X10 are simulated results of successes and failures

    X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1   0  1  0  1  0  0  0  1  1   0
2   0  1  0  0  1  1  1  1  1   1
3   1  0  0  0  0  0  0  1  1   0
4   1  0  1  1  1  0  0  1  0   1
5   0  0  1  0  1  0  1  1  1   0
6   0  1  0  1  0  1  1  0  1   1
7   1  0  0  0  0  1  1  1  1   0
8   0  0  1  1  0  0  1  1  0   1
9   1  0  0  0  0  1  1  1  1   0
10  0  1  1  0  0  1  0  1  0   0

The goal would be to maximize sum(colMeans(DF))

Based on the comment below I figured I would write out the math I am trying to solve. There are several different ways to do this it seems from research but I have no clue how to write the constraints. My larger dataset is 10,000 X 10,000.

One Approach

Objective Function = Max(average(sum(Simutaltion [i])

Constraints:

Observations are binary

Observations = 4

SD(portfolio)) < X , or Min(success) > 3

Other Approach:

Objective Function = Min(SD(sum(Simutaltion [i])) or Max((average(sum(Simutaltion [i])/(SD(portfolio)

Constraints:

Observations are binary

Observations = 4

Average(sum(Simulatin[i])) > 5

So I have been playing around with this some more, and I am starting to get somewhere.

library(quantprog)
library(Matrix)
dmat = cov(t(df))
dmat = dmat$mat
dvec = rowMeans(df)
n = nrow(dmat)
Amat = matrix(1,nrow = n)
bvec = 1
meq = 1
sol <- solve.QP(dmat, dvec = -dvec, Amat = Amat, meq = meq)
sol
$solution
 [1] -2.2700081 -1.0040420  5.9587712 -2.8265966  0.2037187  1.1641067 -2.0099030  1.5024252 -2.0099030
[10]  1.2914309

$value
[1] -0.730477

$unconstrained.solution
 [1]   6.750   6.750   6.750 -81.000 -38.250 -78.750 -12.375  36.000 -12.375 -29.250

$iterations
[1] 2 0

$Lagrangian
[1] 0.5626516

$iact
[1] 1

When I make this change to bvec = c(1, rep(0|-.25, n)) I get a solutions that is essentially all 0,

$solution
 [1]  6.018572e-17 -5.455221e-16  4.873879e-13 -4.176466e-15 -8.236146e-17 -2.663993e-16  9.764273e-17
 [8] -4.092027e-17 -2.630662e-13  0.000000e+00

Used tseries and got these results,

portfolio.optim(t(df), covmat = as.matrix(dmat) )
$pw
 [1] -6.146157e-17  8.336774e-20  1.818182e-01  2.272727e-01  1.212121e-01  3.030303e-01  0.000000e+00
 [8]  2.658696e-17  6.164972e-15  1.666667e-01

$px
 [1] 0.4090909 0.4696970 0.5151515 0.5303030 0.3484848 0.4696970 0.4242424 0.6969697 0.6060606 0.5303030

$pm
[1] 0.5

$ps
[1] 0.1005038

If I can get a weight constraint for tseries that would work too.

Any tips on editing the constraints also would like to add a min success rate as a constraint as well. Thanks,

score 2 · Answer 1 · answered Dec 05 '15 at 18:31

2

I would approach this in two steps. First write down the model in math, and then try to implement this model in a piece of software. Lets start with the simplest, a linear model. Using a standard deviation inside a constraint asks for heavy machinery (the model becomes an MINLP or at least quadratically constrained mixed integer problem).

I believe the mathematical model can be stated as:

Here n=4 and d is your data frame as a matrix. We can even simplify more by observing that the risk equation can be done during preprocessing: drop all teams from the dataframe if their rowsum is less than 2.

Now that we have the model we need to get it into a form accepted by a solver. I saw you used LpSolveAPI. As the model is very simple, it is not too difficult to cast this model in calls to the lpSolveAPI.

answered Dec 05 '15 at 18:31

Erwin Kalvelagen

15,677
2
14
39

Thanks for the advice. I can get this simplified version to run with the excel solver. Unfortunately the dataset is too large for excel to handle. If it would be easier to minimize sd for a given success rate that would work too. The only constraint I have is that the decision variable is binary. I thought a portfolio model with equal weights of 1/4 would work well but I just don't know how to program the constraints. – JB17 Dec 05 '15 at 20:20
The 2 models I was trying were a variant of max sharpe ratio, and efficient portfolio for a given return. With weights equal to 1/4. Do you have any tips on hope to program those constraints – JB17 Dec 05 '15 at 20:22
Minimization of the standard deviation is often better implemented as minimization of the variance. Taking the square of an objective is a monotonic operation, so the optimal solution is the same. – Erwin Kalvelagen Dec 06 '15 at 08:10
The Sharpe Ratio is return/standard deviation. So this is nonlinear, but reasonably well-behaved. The optimal Sharp Ratio portfolio is one point on the efficient frontier of a mean-variance portfolio model. In portfolio models the main constraint is that the sum of the allocations is one (the budget constraint). Certainly LpSolve will not be able to handle this as that is a linear solver. Excel actually has a capable solver (by default limited to small problems, but a commercial offering from Frontline Systems allow you to solve larger problems). – Erwin Kalvelagen Dec 06 '15 at 08:19
Setting up a constraint of 4 equally weighted assets should be trivial. Without loss of generality we can assume a budget of one. Then just fix the allocations of your 4 assets to 0.25. – Erwin Kalvelagen Dec 06 '15 at 08:22
Any suggestions on how to set up those 4 constraints. This is a hobby project so purchasing a licenses isn't worth it. I tried opensolver but did work with my old Mac. Figured R could do this better anyway just need some help with the set up. The bottom 2 codes were attempts for the efficient portfolios. – JB17 Dec 06 '15 at 11:32

Optimization with Binary decision variable and simulated database

1 Answers1