I'm trying to use MXNet to do some constrained optimization that isn't backpropogation in a feedforward network, but involves similar computations, products of large arrays, some gradient descent, etc...
For example, to minimize the trace of M-2*Id as M varies over the set of orthogonal matrices, I could use numpy and scipy to do this by vectorizing the matrices, as in the following:
import numpy as np
from scipy.optimize import minimize
# make matrix to vector and vector to matrix functions
def toVector(m):
return np.hstack(m.flatten())
def toMatrix(vec):
return vec[:4*4].reshape(4,4)
# Define objective function to minimize
def f(x):
matM=toMatrix(x)
return(np.trace(matM-2*np.identity(4)))
# Define the constraint that X be orthogonal, i.e. X X^t = I
cons = ({'type': 'eq',
... 'fun' : lambda x: np.array(np.linalg.norm(
... np.dot(toMatrix(x),np.transpose(toMatrix(x)))-np.eye(4)))
... })
# Define an initial point randomly
m0=np.random.rand(4,4)
# And minimize
result = minimize(f, toVector(m0), constraints=cons,
... method='SLSQP', options={'disp': True})
toMatrix(result.x)
Now, suppose that I'm doing this kind of computation for NxN matrices where N is large, and I want to repeat the computation many times, updating some parameters. Is there a good way to do this kind of constrained optimization using MXNet to work across GPU cores, compute constraint gradients, etc... without vectorizing the input and using a feedforward network workaround described in simple-gradient-descent-using-mxnet.