2

In MXNet, if I wanted to create a vector of weights that multiplied each input, i.e. to have w*x_i and then backprop over the weights w how would I do this?

I tried:

 y_hat = input
 w1 = mx.sym.Variable("w1")
 y_hat = mx.symbol.broadcast_mul(w1, y_hat)
Drew
  • 65
  • 5
  • Give some inputs and show/ask the expected output you want – Shivkumar kondi Aug 07 '17 at 06:57
  • Let's say `x = [[1, 2, 3], [4, 5, 6]]`, `y=[12, 30]`, I would like to have a parameters `w` which I can train such that `y=w*x`, so if `w=2` then `w*x=[[2,4,6],[8,10,12]]` an element wise multiplication of the vector `w #size=(num_features, 1)` with each feature of a sample in `x` – Drew Aug 07 '17 at 07:00

1 Answers1

3

You can cast the computation in terms of a dot product:

x = mx.nd.array([[1, 2, 3], [4, 5, 6]])
w = mx.nd.array([2,2,2])
mx.nd.dot(w, x.T)

will result in [ 12. 30.] as you wish.

Now just initialize w randomly, compute a loss between the output and your target output and then back propagate. You can use the new gluon interface for that (http://gluon.mxnet.io/).

Specifically, let's look at a minimal example adapted http://mxnet.io/tutorials/gluon/gluon.html and http://gluon.mxnet.io/P01-C05-autograd.html

Prepare the data

label = mx.nd.array([12,30])
x = mx.nd.array([[1, 2, 3], [4, 5, 6]])
w = random weights
w.attach_grad()

And train

with autograd.record():
    output = mx.nd.dot(w, x.T)
    loss = gluon.loss.L2Loss(output, label)
    loss.backward()

Don't forget updating the weights with the gradient you computed in the backward pass. The gradient will be available in w.grad. Run the training code together with the weight update in a loop as a single update likely won't suffice for convergence.

leezu
  • 512
  • 3
  • 17
  • Is there a concrete example of this? I thought you had to initialize (i think its called bind, some arguments for gradients, for anything you want to backprop over).. How does it know which variable to backprop over? – Drew Aug 07 '17 at 15:18
  • That is indeed needed if you use the symbolic API. The advantage of the symbolic API is that it is faster. However the newly introduced gluon API which I refer to above is easier to develop with, so I recommend you to first get your idea working there. I am editing the post above to add some more information to the example. – leezu Aug 07 '17 at 15:24
  • I don't have any gluon dependency in my kind of big code base. Does gluon play with with MXNet? I just want to add one parameter, so complicating all the code base might be too much. Not to mention, speed is an issue for me already.. Just curiously, how would I do this in the class symbolic API? – Drew Aug 07 '17 at 19:49
  • Gluon is part of mxnet (though only in the master branch, until the next release is made). For the symbolic API you could follow the same approach, just use mx.sym.dot instead of mx.nd.dot and define Variables for x,y and w. To get the gradient with respect to w you would need to use get_input_grads() of your module, as w is seen as "input data". Alternatively to have mxnet manage w as weight array, you'd need to define your own operator. Depending on your codebase it might be the easiest though to just mix the imperative / gluon API with your existing codebase. There is no problem with that. – leezu Aug 08 '17 at 03:37
  • Is there an example of defining w as an operator? It seems complicated either way – Drew Aug 09 '17 at 00:31
  • You can check http://mxnet.io/how_to/new_op.html and http://mxnet.io/architecture/overview.html#operators-in-mxnet , but probably it will be easier to just mix the imperative / gluon API with your existing codebase. – leezu Aug 09 '17 at 02:58
  • gluon is in prerelease though right? Its not in my AWS Instance – Drew Aug 10 '17 at 05:45
  • Yes. You need to compile mxnet yourself, which is quite straightforward though. – leezu Aug 10 '17 at 07:45