1

Introduction

Using Python, I want to retrieve a set of solutions that satisfy the following equation describing an ellipsoid:

where H is a positive definite matrix.

In order to retrieve a vector x that satisfies the condition for an arbitrary positive definite matrix 30-dimensional matrix H and arbitrary y, I will minimize the following function:

from scipy.optimize import minimize
import numpy as np

N = 30
H = np.identity(N)
y = 5

def fct(x):
    return np.abs(np.dot(x, np.dot(H, x))-y)

x0 = np.ones(N)/ N
solution = minimize(fct, x0, method='Nelder-Mead')['x']

In light of my research goal, I want to retrieve many different solutions that satisfy the condition for a large number (=500) of different positive definite matrices H. As such, I will minimize the function many times with different starting values denoted by x0 for each matrix H. I would like to tailor my approach to the problem such that I gain some performance (in terms of computational time) from special characteristics of my problem.

Problem Analysis

I know very little of numerical optimization (plan on diving into it when I have the time) but after having read an interesting post here and some scipy lectures I have come to the following conclusions:

  • My problem is one of convex optimization given that H is a positive definite matrix.
  • I believe my problem is one of unconstrained optimization.
  • I am not sure whether my problem is well or ill conditioned. Solutions do change with a change in starting values but I think my problem is well conditioned.

Questions

Is it correct to state that in case H is a positive definite matrix my problem is an unconstrained convex optimization problem?

The Nelder-Mead algorithm does return solutions that 'make sense'. Could I gain a speed up by leveraging characteristics of my problem? Or potentially by use of a different optimization algorithm? I am aware that there are certain methods for which supplying the jacobian and hessian dramatically improve performance but I am not sure whether that applies to my problem.

Many thanks for any help.

Paul
  • 75
  • 7
  • Why Nelder-mead? Just use (L-)BFGS and provide the jacobian/hessian information, or just use a QP-solver, where you don't need any initial-points. Looking at the code, which is actually doing something different than the formula, you might think about your l1 vs. l2 penalty (you are doing l1) and in the latter case it can also be tackled by least-squares solvers. – sascha Jul 07 '18 at 12:08
  • If H is semidefinite, what is your definition of the inverse of H? Also, the equation uses the inverse of H, but the code creates and uses a matrix called H. Is the use of the inverse of H in the equation actually what you want? – Warren Weckesser Jul 07 '18 at 13:46
  • @sascha, the reason I did not initially used (L-)BFGS is that it seems to return only the equicoordinate solutions, that is, the solution vector x where all values coincide in each dimension. I am looking to retrieve many solutions that will serve as input to further analysis, including nonequi solutions. But thank you for your answer. I guess Nelder-Mead is a bit slower than (L-BFGS). Also: how is my code doing somehting different than the formula? I corrected for the inverse of a positive definite matrix. – Paul Jul 07 '18 at 17:29
  • @WarrenWeckesser I am actually dealing with positive definite matrices. I corrected for this in the question! I indeed need the inverse of H, which is done now as the inverse of an identity matrix is the identity matrix. – Paul Jul 07 '18 at 17:31

2 Answers2

3

The problem that you are solving is an equation that can be expressed as F(x, params) = 0, so it is more natural to use a root-finding function rather than a minimization function. In this case, you have 30 variables and just one equation, so the problem is highly underdetermined, and you would have to deal with that if you used a root-finding function.

The equation defines the level set of a positive definite quadratic form, so you know (as you pointed out) that the set of solutions is an (hyper)ellipsoid. You can take advantage of this structure to easily generate points on the ellipsoid without resorting to a numerical solver.

Three sets of points that are not hard to generate are: the points where the ellipsoid intersects the coordinate axes; the extrema of the ellipsoid relative to the coordinate axes (i.e. the points where a bounding box is tangent to the ellipsoid); and the points where the principal axes of the ellipsoid intersect the ellipsoid. You can also generate random points on the ellipsoid by first generating random points on the unit hypersphere, and then transforming those points to the ellipsoid. This is also demonstrated in the script. Finally, if x is a solution, then so is -x. (That fact is not used in the script.)

Here's a script that creates the sets of points described above. I use 10 variables instead of 30 to limit the amount of printed output.

import numpy as np
from scipy.linalg import sqrtm
from scipy.stats import ortho_group


#--------------------------------------------
# Generate example input
#--------------------------------------------

dim = 10

# Create a positive definite matrix H with shape (dim, dim).
# evals is the vector of eigenvalues of H.  This can be replaced
# with any vector of length `dim` containing positive values.
evals = (np.arange(1, dim + 1)/2)**2
# Use a random orthogonal matrix to generate H.
R = ortho_group.rvs(dim)
H = R.T.dot(np.diag(evals).dot(R))

# y determines the level set to be computed.
y = 3.0

#--------------------------------------------
# Compute various points on the ellipsoid
#--------------------------------------------

Hinv = np.linalg.inv(H)

# Ellipsoid intercepts of the coordinate axes
xintercepts = np.diag(np.sqrt(y / np.diag(Hinv)))

# Ellipsoid coordinate extrema
xextrema = H.dot(np.diag(np.sqrt(y / np.diag(H))))

# Ellipsoid intersections with principal axes
evals, evecs = np.linalg.eigh(Hinv)
xprincipal = np.sqrt(y/evals)*evecs

# Generate random points on the ellipsoid.
# The distribution of the random points is not uniform on the ellipsoid.
nsample = 5
r = np.random.randn(H.shape[0], nsample)
# u contains random points on the hypersphere with radius 1.
u = r / np.linalg.norm(r, axis=0)
xrandom = sqrtm(H).dot(np.sqrt(y)*u)

#--------------------------------------------
# Print results
#--------------------------------------------

np.set_printoptions(precision=4, linewidth=120)

print("Columns are the ellipsoid coordinate axis intercepts:")
print(xintercepts)
print("Check:", np.array([x.dot(Hinv.dot(x)) for x in xintercepts.T]))

print()

print("Columns are points where the ellipsoid is tangent to its bounding box:")
print(xextrema)
print("Check:", np.array([x.dot(Hinv.dot(x)) for x in xextrema.T]))

print()

print("Columns are points where the principal axes of the ellipsoid intersect the ellipsoid:")
print(xprincipal)
print("Check:", np.array([x.dot(Hinv.dot(x)) for x in xprincipal.T]))

print()

print("Columns are random points on the ellipsoid:")
print(xrandom)
print("Check:", np.array([x.dot(Hinv.dot(x)) for x in xrandom.T]))

Here's the output when the script is run. H is random, so the output will be different each time the script is run. The line beginning with Check: after each matrix shows the result of computing x.dot(Hinv.dot(x)) for each column. The values shown after Check: should all be y (with a small tolerance to allow for normal numerical imprecision).

Columns are the ellipsoid coordinate axis intercepts:
[[ 2.9341  0.      0.      0.      0.      0.      0.      0.      0.      0.    ]
 [ 0.      2.752   0.      0.      0.      0.      0.      0.      0.      0.    ]
 [ 0.      0.      2.0081  0.      0.      0.      0.      0.      0.      0.    ]
 [ 0.      0.      0.      4.3061  0.      0.      0.      0.      0.      0.    ]
 [ 0.      0.      0.      0.      3.4531  0.      0.      0.      0.      0.    ]
 [ 0.      0.      0.      0.      0.      1.7946  0.      0.      0.      0.    ]
 [ 0.      0.      0.      0.      0.      0.      3.5877  0.      0.      0.    ]
 [ 0.      0.      0.      0.      0.      0.      0.      1.8925  0.      0.    ]
 [ 0.      0.      0.      0.      0.      0.      0.      0.      1.2435  0.    ]
 [ 0.      0.      0.      0.      0.      0.      0.      0.      0.      2.9075]]
Check: [ 3.  3.  3.  3.  3.  3.  3.  3.  3.  3.]

Columns are points where the ellipsoid is tangent to its bounding box:
[[ 4.2205 -1.6619  0.9486 -1.0971  0.0672  0.0988  2.0569 -0.2758  0.4345 -1.1246]
 [-2.392   6.0745 -0.1126  0.262   0.1952  0.7275 -1.829  -3.3036  0.3353 -0.4161]
 [ 1.2413 -0.1023  5.5224 -1.9892  1.3277  0.6662  0.6003 -2.6505 -1.0728  0.65  ]
 [-1.3493  0.2239 -1.8697  5.1908  0.0367 -0.9623  0.1469  1.5594 -0.3419  0.4431]
 [ 0.0917  0.1851  1.385   0.0407  5.7611 -2.3986  0.884   0.1803 -2.3425 -1.4966]
 [ 0.1385  0.7082  0.7134 -1.0963 -2.4621  5.9136 -2.6088 -0.8365  4.7242 -0.2406]
 [ 2.5176 -1.5554  0.5615  0.1462  0.7926 -2.2789  5.1657  0.0131 -1.8862 -0.0804]
 [-0.3829 -3.1874 -2.8128  1.7606  0.1834 -0.829   0.0149  5.8607 -1.2844  2.1692]
 [ 0.4579  0.2456 -0.8642 -0.293  -1.8089  3.5539 -1.6244 -0.9749  4.4486 -1.8706]
 [-1.4002 -0.3599  0.6185  0.4485 -1.3651 -0.2138 -0.0818  1.945  -2.2096  5.2549]]
Check: [ 3.  3.  3.  3.  3.  3.  3.  3.  3.  3.]

Columns are points where the principal axes of the ellipsoid intersect the ellipsoid:
[[-0.4701  1.1006 -3.008   1.1252 -0.8771  0.3136 -1.0045  2.0293  0.2615 -0.0509]
 [ 3.2968  1.5837  4.3875 -0.4139 -0.9073 -0.2182 -1.6669  0.5303 -0.3532  0.2126]
 [ 0.9165  3.6613 -1.9066 -2.764   1.0091  1.727   0.7535  0.1335 -0.5318  0.3281]
 [-1.5272 -1.9226  2.4674  1.0623 -0.8445  3.5199  0.6734  0.3716  0.018  -0.0609]
 [-2.2736  3.3975  1.2729  1.3181  3.3424  0.6659 -1.0468 -0.2749  0.5697 -0.0946]
 [ 4.6168 -2.0728 -1.9233 -0.4141  1.2233  1.3196 -1.3411 -0.447  -0.3202 -0.3887]
 [-2.7789  1.8796 -1.6604  0.6303 -2.7613  0.7791 -1.534  -1.2874 -0.1734  0.0563]
 [-3.7189 -3.8936 -0.3987 -0.0928  1.7785 -0.2127 -1.1247  0.2619 -0.7321  0.3339]
 [ 3.2782 -1.6147 -1.2554  1.8212  0.203   0.5629 -0.1255 -0.4354  0.8151  0.5622]
 [-1.3981 -1.6701  0.4087 -4.573  -0.423   0.279  -0.8141  0.0621  0.9307 -0.0114]]
Check: [ 3.  3.  3.  3.  3.  3.  3.  3.  3.  3.]

Columns are random points on the ellipsoid:
[[ 1.5979 -1.8849  0.4878 -1.8069 -0.8968]
 [-0.6583 -0.304   0.5729 -0.6411  1.1796]
 [-0.857  -1.0776  1.4855 -2.2664 -1.6846]
 [ 0.1828  0.3555  2.4142  2.7053  3.0825]
 [-0.4522  0.6217  3.8429 -2.4424  0.0587]
 [-0.6646 -2.4867 -0.3544  2.0967 -2.9529]
 [-1.0759 -1.2378  0.7088 -0.561   0.2613]
 [ 2.6662  2.862   0.3385  0.9958  0.821 ]
 [-1.0947 -3.0513 -0.9832  2.4314 -1.6179]
 [ 0.9955  2.3737  0.14   -0.1343 -1.0845]]
Check: [ 3.  3.  3.  3.  3.]
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
  • Many thanks @WarrenWeckesser for the elaborate and insightful answer. Your explanation clearly shows how I can take advantage of the problem structure to easily/ efficiently generate points on the ellipsoid without resorting to a numerical solver. Thank you very much! – Paul Jul 08 '18 at 12:00
  • One question @WarrenWeckesser. Is xrandom = sqrtm(H).dot(np.sqrt(y)*u) not supposed to be defined with the inverse of H: xrandom = sqrtm(Hinv).dot(np.sqrt(y)*u) ? – Paul Jul 08 '18 at 20:02
  • 1
    To map from the unit hypersphere to the hyperellipsoid, we multiply by the inverse square root of the matrix of the quadratic form. In your problem, the matrix of the quadratic form is Hinv, and its inverse is H. (You could also just try it; you'll see that it doesn't work.) – Warren Weckesser Jul 09 '18 at 14:09
1

The question you've provided is definitely unconstrained convex if H is positive definite. This is because the inverse of a positive definite matrix is also positive definite, so when you take the Jacobian, you get H inverse plus H inverse transposed, which would be positive semidefinite itself (See here and here). Because of this convexity, I'd recommend you use a convex solver like CVXPY. CVXPY has a bunch of different solver options too, so you can try out different QP options from this list and see which one gets the best runtime (although that might be an input dependent question here).

cmitch
  • 233
  • 1
  • 10
  • Thank you @cmitch for providing an answer to the question whether the provided question is one of unconstrained convex. – Paul Jul 08 '18 at 11:55