0

My optimization problem has ~200 variables but I would like to find a solution that only uses 5 of them. As in 195 should be zero and the other 5 can be non zero.

I have tried the following constraint, but it seems that the optimization algorithm completely ignores it, as it continues to use all 200 variables whether I include the constraint or not. Is there something I am missing or can SLSQP just not handle this?

import pandas as pd
import numpy as np
from scipy.optimize import minimize, Bounds

tmp = pd.DataFrame()
tmp.insert(loc=0,column='pred', value = np.random.random(200))
tmp['x0'] = 0

def obj(x, df = tmp):
  return np.sum(-x * df['pred'].values)

def c2(x):
  return -(len(x[x!=0])-5)

sol = minimize(fun=obj,x0=tmp['x0'],method='SLSQP',bounds=Bounds(-10,10),jac=False,
  constraints=({'type': 'ineq', 'fun': c2}),
  options={'maxiter': 1000})

When I run this it just sets everything to 10 and ignores c2.

helloimgeorgia
  • 311
  • 1
  • 10
  • Take a look at this answer: https://stackoverflow.com/a/63267534/12131013. The optimizer cannot figure out how to get close to satisfying the constraint so it is failing. (If you print `sol.success`, you'll see that it is `False`.) – jared Jun 07 '23 at 19:44
  • thanks. I can't think of a way to make this constraint non-flat though since its binary for each variable...? – helloimgeorgia Jun 07 '23 at 20:03
  • Ignoring how you'd do this in Python, let's consider how you'd solve this by hand. To me, the problem is essentially: given 200 numbers, select 5 of them to make the smallest number possible. To do that, I'd pick the 5 largest values (in magnitude) and multiply them by +10 or -10 (depending on their sign) and sum them to create the smallest number I can. You don't need scipy to do that. – jared Jun 08 '23 at 00:04

1 Answers1

3

slsqp (and indeed minimize) are not practical for such a problem, but you can still use scipy. Use a linear program instead where all variables are binary assignments. Your objective is linear since it's just a dot product.

import numpy as np
from numpy.random import default_rng
from scipy.optimize import linprog


rand = default_rng(seed=0)
pred = rand.random(200)
# We disregard the factor of 10. The maximum solution implies
# that all selected variables will be multiplied by 10.

result = linprog(
    c=-pred,        # maximize dot product of pred with x
    bounds=(0, 1),  # all selection variables binary
    integrality=np.ones_like(pred, dtype=bool),
    # Exactly 5 of the variables need to be used
    A_eq=np.ones((1, len(pred))), b_eq=[5],
)
print(result.message)
assert result.success
idx, = result.x.nonzero()
print('These indices were used:', idx)
print('These values were used:', pred[idx])
Optimization terminated successfully. (HiGHS Status 7: Optimal)
These indices were used: [ 26  77  94 171 194]
These values were used: [0.99720994 0.99509651 0.98119504 0.99491735 0.98194269]

But really, since pred is non-negative, this is much more simple as

import numpy as np
from numpy.random import default_rng


rand = default_rng(seed=0)
pred = rand.random(200)
print('Use these values:', np.sort(pred)[-5:])
Use these values: [0.98119504 0.98194269 0.99491735 0.99509651 0.99720994]

The results are the same.

Reinderien
  • 11,755
  • 5
  • 49
  • 77