solving a sparse non linear system of equations using scipy.optimize.root

Question

I want to solve the following non-linear system of equations.

Notes

the dot between a_k and x represents dot product.
the 0 in the first equation represents 0 vector and 0 in the second equation is scaler 0
all the matrices are sparse if that matters.

Known

K is an n x n (positive definite) matrix
each A_k is a known (symmetric) matrix
each a_k is a known n x 1 vector
N is known (let's say N = 50). But I need a method where I can easily change N.

Unknown (trying to solve for)

x is an n x 1 a vector.
each alpha_k for 1 <= k <= N a scaler

My thinking.

I am thinking of using scipy root to find x and each alpha_k. We essentially have n equations from each row of the first equation and another N equations from the constraint equations to solve for our n + N variables. Therefore we have the required number of equations to have a solution.

I also have a reliable initial guess for x and the alpha_k's.

Toy example.

n = 4
N = 2
K = np.matrix([[0.5, 0, 0, 0], [0, 1, 0, 0],[0,0,1,0], [0,0,0,0.5]])
A_1 = np.matrix([[0.98,0,0.46,0.80],[0,0,0.56,0],[0.93,0.82,0,0.27],[0,0,0,0.23]])
A_2 = np.matrix([[0.23, 0,0,0],[0.03,0.01,0,0],[0,0.32,0,0],[0.62,0,0,0.45]])
a_1 = np.matrix(scipy.rand(4,1))
a_2 = np.matrix(scipy.rand(4,1))

We are trying to solve for

 x = [x1, x2, x3, x4] and alpha_1, alpha_2

Questions:

I can actually brute force this toy problem and feed it to the solver. But how do I do I solve this toy problem in such a way that I can extend it easily to the case when I have let's say n=50 and N=50
I will probably have to explicitly compute the Jacobian for larger matrices??.

Can anyone give me any pointers?

Show the brute force approach. We need your code to start with. — hpaulj, Feb 05 '17 at 12:34
This may be totally off but perhaps [cvxopt](http://cvxopt.org/userguide/index.html) may fit your problem. Just an idea. — Paul Panzer, Feb 05 '17 at 12:42
`least_squares` supports sparse matrices. The docstring has an example of root-finding — ev-br, Feb 05 '17 at 13:37
Some random comments: (1) It would help if you know some more specifics about that problem. Non-linear is a bit vague, is it also non-convex? (2) If it's non-convex, it can't be formulated within ```cvxopt``` (Paul's approach). (3) If it's convex, it might be possible to formulate it within cvxopt/cvxpy and co. and you will get a polynomial-time solver (there are still problems which are convex and can't be formulated within DCP @ cvxopt etc.) (4) Maybe i'm missing something, but that problem looks constrained and i don't see sparse.linalg or least_squares solving this. (5) Try ```SLSQP```. — sascha, Feb 05 '17 at 17:12
I often solve (large) sparse nonlinear systems `F(X)=0` by `min r'r subject to F(X)=r` using a large sparse NLP solver such as CONOPT or IPOPT. This seems fairly reliable. — Erwin Kalvelagen, Feb 23 '17 at 14:14

heyiamt · Accepted Answer · 2017-02-20T14:46:41.023

I think the scipy.optimize.root approach holds water, but steering clear of the trivial solution might be the real challenge for this system of equations.

In any event, this function uses root to solve the system of equations.

def solver(x0, alpha0, K, A, a):
'''
x0     - nx1 numpy array. Initial guess on x.
alpha0 - nx1 numpy array. Initial guess on alpha.
K      - nxn numpy.array.
A      - Length N List of nxn numpy.arrays.
a      - Length N list of nx1 numpy.arrays.
'''

# Establish the function that produces the rhs of the system of equations.
n = K.shape[0]
N = len(A)
def lhs(x_alpha):
    '''
    x_alpha is a concatenation of x and alpha.
    '''

    x = np.ravel(x_alpha[:n])
    alpha = np.ravel(x_alpha[n:])
    lhs_top = np.ravel(K.dot(x))
    for k in xrange(N):
        lhs_top += alpha[k]*(np.ravel(np.dot(A[k], x)) + np.ravel(a[k]))

    lhs_bottom = [0.5*x.dot(np.ravel(A[k].dot(x))) + np.ravel(a[k]).dot(x)
                  for k in xrange(N)]

    lhs = np.array(lhs_top.tolist() + lhs_bottom)

    return lhs

# Solve the system of equations.
x0.shape = (n, 1)
alpha0.shape = (N, 1)
x_alpha_0 = np.vstack((x0, alpha0))
sol = root(lhs, x_alpha_0)
x_alpha_root = sol['x']

# Compute norm of residual.
res = sol['fun']
res_norm = np.linalg.norm(res)

# Break out the x and alpha components.
x_root = x_alpha_root[:n]
alpha_root = x_alpha_root[n:]


return x_root, alpha_root, res_norm

Running on the toy example, however, only produces the trivial solution.

# Toy example.
n = 4
N = 2
K = np.matrix([[0.5, 0, 0, 0], [0, 1, 0, 0],[0,0,1,0], [0,0,0,0.5]])
A_1 = np.matrix([[0.98,0,0.46,0.80],[0,0,0.56,0],[0.93,0.82,0,0.27],      
                [0,0,0,0.23]])
A_2 = np.matrix([[0.23, 0,0,0],[0.03,0.01,0,0],[0,0.32,0,0],
      [0.62,0,0,0.45]])
a_1 = np.matrix(scipy.rand(4,1))
a_2 = np.matrix(scipy.rand(4,1))
A = [A_1, A_2]
a = [a_1, a_2]
x0 = scipy.rand(n, 1)
alpha0 = scipy.rand(N, 1)

print 'x0 =', x0
print 'alpha0 =', alpha0

x_root, alpha_root, res_norm = solver(x0, alpha0, K, A, a)

print 'x_root =', x_root
print 'alpha_root =', alpha_root
print 'res_norm =', res_norm

Output is

x0 = [[ 0.00764503]
 [ 0.08058471]
 [ 0.88300129]
 [ 0.85299622]]
alpha0 = [[ 0.67872815]
 [ 0.69693346]]
x_root = [  9.88131292e-324  -4.94065646e-324   0.00000000e+000        
          0.00000000e+000]
alpha_root = [ -4.94065646e-324   0.00000000e+000]
res_norm = 0.0

Actually the A1 and A2 in the toy example are not symmetric as I have given them. I will go ahead and try symmetric matrices and see if that makes any difference. In any case this is a start. Thanks. — Kid Charlamagne, Feb 09 '17 at 19:46
A couple of questions. 1. When you say `rhs` you mean `lhs` right? :) . 2. What is the significance of computing the norm of residuals?. The closer that this value is to zero the better our solution is I guess? — Kid Charlamagne, Feb 19 '17 at 10:32
@JackDawkins I had originally named it `rhs` as in the value that should be "converged to", but I think you're right that calling it `lhs` is more descriptive. — heyiamt, Feb 20 '17 at 14:48
Re: norm of the residuals - residuals are errors in the approximate root. The smallest value they can be (in absolute value) is 0. If the norm of the residuals is 0, there's effectively 0 error. Minimizing some measure of residuals is a pretty common approach to solving systems of equations (least squares, l1-norm minimization, etc.) — heyiamt, Feb 20 '17 at 15:18