You already got a working answer from @constantstranger. IMO, there's just one problem: it's quite slow. More precisely, it took more than a minute to solve the problem on my machine.
Therefore, some notes on what could be done in order to speed up the solver in the following:
Since Python has a noticeable overhead when calling functions, it's a good idea to implement all functions as fast as possible. For instance, evaluating one vectorial constraint function is faster than evaluating multiple scalar constraint functions.
At the moment, all derivatives (the objective gradient and the constraint Jacobian) are approximated by finite differences. This is a real bottleneck because each evaluation of the approximated derivative goes in hand with multiple objective/constraint function evaluations. Instead, it's highly recommended to provide the exact derivatives or use algorithmic differentiation.
Last but not least, scipy.optimize.minimize
is only suited for small to mid-sized problems at least. If you are willing to use another package, you could use IPOPT, the state-of-the-art NLP solver. The cyipopt package provides a scipy-like interface, so it isn't hard switching from scipy.optimize.minimize.
Besides from that, your problem is a (convex) quadratic optimization problem and can be formulated as follows:
min f(w) s.t. A*w <= 0.5, u_l <= w <= 0.05
with
f(w) = (1/sum(u_w)) * ||w - u_w||^2_2 = (1/sum(u_w)) * (w'Iw - 2u_w'*w + u_w'u_w)
where A[i,j] = 1 if w[j] belongs to sector i and 0 otherwise.
Then, solving the problem with IPOPT (note that we pass the exact derivatives) looks like this:
import numpy as np
import pandas as pd
from cyipopt import minimize_ipopt
# dataframe
df = pd.read_csv("https://raw.githubusercontent.com/norhther/datasets/main/data(1).csv")
df = df.drop("Unnamed: 0", axis = 1)
# sectors
sectors = df["Sector Code"].unique()
# building the matrix A
A = np.zeros((sectors.size, len(df)))
for i, sec in enumerate(sectors):
indices = df[df["Sector Code"] == sec].index.values
A[i, indices] = 1
uw = df['uw'].values
uw_sum = uw.sum()
# objective
def obj(w):
return np.sum((w - uw)**2) / uw_sum
# objective gradient
def grad(w):
return (2*w - 2*uw) / uw_sum
# Linear Constraint A @ w <= 0.5 <=> 0.5 - A @ w >= 0
cons = [{'type': 'ineq', 'fun': lambda w: 0.5 - A @ w, 'jac': lambda w: -A}]
# variable bounds
bounds = [(u_i, 0.05) for u_i in df.ul.values]
# feasible initial guess
w0 = np.ones(len(df)) / len(df)
# solve the problem
res = minimize_ipopt(obj, x0=w0, jac=grad, bounds=bounds, constraints=cons)
print(res)
On my machine, this terminates in less than 2 seconds and yields
******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
Ipopt is released as open source code under the Eclipse Public License (EPL).
For more information visit https://github.com/coin-or/Ipopt
******************************************************************************
fun: 0.4306218505716169
info: {'x': array([0.05 , 0.05 , 0.03128946, ..., 0.04103131, 0.05 ,
0.03348038]), 'g': array([-3.51687688, -9.45217602, -7.88799127, -1.78825803, -1.86650095,
-5.09092925, -2.11181422, -1.35485327, -1.15847276, 0.35 ]), 'obj_val': 0.4306218505716169, 'mult_g': array([-1.000000e+03, -1.000000e+03, -1.000000e+03, -1.000000e+03,
-1.000000e+03, -1.000000e+03, -1.000000e+03, -1.000000e+03,
-1.000000e+03, -2.857166e-09]), 'mult_x_L': array([1000.02960821, 1000.02197802, 1000.00000005, ..., 1000.00000011,
1000.02728049, 1000.00000006]), 'mult_x_U': array([0.00000000e+00, 0.00000000e+00, 5.34457820e-08, ...,
1.11498931e-07, 0.00000000e+00, 6.05340266e-08]), 'status': 2, 'status_msg': b'Algorithm converged to a point of local infeasibility. Problem may be infeasible.'}
message: b'Algorithm converged to a point of local infeasibility. Problem may be infeasible.'
nfev: 13
nit: 9
njev: 7
status: 2
success: False
x: array([0.05 , 0.05 , 0.03128946, ..., 0.04103131, 0.05 ,
0.03348038])
[Finished in 1.9s]