Gradient descent using MXNet NDArray

Question

I'm trying to use MXNet to do some constrained optimization that isn't backpropogation in a feedforward network, but involves similar computations, products of large arrays, some gradient descent, etc...

For example, to minimize the trace of M-2*Id as M varies over the set of orthogonal matrices, I could use numpy and scipy to do this by vectorizing the matrices, as in the following:

import numpy as np
from scipy.optimize import minimize

# make matrix to vector and vector to matrix functions
def toVector(m):
    return np.hstack(m.flatten())
def toMatrix(vec):
    return vec[:4*4].reshape(4,4)

# Define objective function to minimize
def f(x):
    matM=toMatrix(x)
    return(np.trace(matM-2*np.identity(4)))

# Define the constraint that X be orthogonal, i.e. X X^t = I
cons = ({'type': 'eq',
    ... 'fun' : lambda x: np.array(np.linalg.norm(
    ... np.dot(toMatrix(x),np.transpose(toMatrix(x)))-np.eye(4)))
    ... })

# Define an initial point randomly
m0=np.random.rand(4,4)

# And minimize
result = minimize(f, toVector(m0), constraints=cons,
    ... method='SLSQP', options={'disp': True})
toMatrix(result.x)

Now, suppose that I'm doing this kind of computation for NxN matrices where N is large, and I want to repeat the computation many times, updating some parameters. Is there a good way to do this kind of constrained optimization using MXNet to work across GPU cores, compute constraint gradients, etc... without vectorizing the input and using a feedforward network workaround described in simple-gradient-descent-using-mxnet.

score 0 · Answer 1 · answered Feb 16 '18 at 20:22

You don't need to use a neural network part of MxNet to use GPU. You also can avoid vectorization, but your code will be slower because of the synchronization needed to be done between CPU and GPU.

Using Mxnet for custom math on GPU is straightforward. You just limit your math with operations on NDArray. This data structure is very similar to the NumPy one, but it supports execution on CPU, GPU and mupliple GPUs.

That means that as long as you use functions from mx.nd.* and provide mx.gpu() as execution context, your math would be executed on GPU. That also means that you cannot use anything from NumPy, because NumPy executes only on CPU. You still can use "for" and "if" statements to avoid vectorization, but since the control flow is done on CPU, and it will have to go back and forth to GPU for synchronization.

Please, refer to this section of the examples https://mxnet.apache.org/tutorials/basic/ndarray.html#advanced-topics to see how to do math on GPU using NDArray.

Gradient descent using MXNet NDArray

1 Answers1