7

Recently I started thinking about implementing Levenberg-Marquardt algorithm for learning an Artificial Neural Network (ANN). The key to the implementation is to compute a Jacobian matrix. I spent a couple hours studying the topic, but I can't figure out how to compute it exactly.

Say I have a simple feed-forward network with 3 inputs, 4 neurons in the hidden layer and 2 outputs. Layers are fully connected. I also have 5 rows long learning set.

  1. What exactly should be the size of the Jacobian matrix?
  2. What exactly should I put in place of the derivatives? (Examples of the formulas for the top-left, and bottom-right corners along with some explanation would be perfect)

This really doesn't help:

enter image description here

What are F and x in terms of a neural network?

πάντα ῥεῖ
  • 1
  • 13
  • 116
  • 190
Andrzej Gis
  • 13,706
  • 14
  • 86
  • 130

3 Answers3

8

The Jacobian is a matrix of all first-order partial derivatives of a vector-valued function. In the neural network case, it is a N-by-W matrix, where N is the number of entries in our training set and W is the total number of parameters (weights + biases) of our network. It can be created by taking the partial derivatives of each output in respect to each weight, and has the form:

enter image description here

Where F(xi, w) is the network function evaluated for the i-th input vector of the training set using the weight vector w and wj is the j-th element of the weight vector w of the network. In traditional Levenberg-Marquardt implementations, the Jacobian is approximated by using finite differences. However, for neural networks, it can be computed very efficiently by using the chain rule of calculus and the first derivatives of the activation functions.

abhinash
  • 188
  • 10
  • So in the example from my question the number of matrix columns = 3 * 4 + 4 * 2 = 20 (let's forget about the biases for a moment) and the number of matrix rows is the same as the number of rows in the data set (5)? What should the F function look like? – Andrzej Gis Oct 01 '14 at 15:50
  • @abhinash Are gradient and Jacobian the same? Because even the gradient matrix would be of size NxW as we plug it in gradient descent equation – shaifali Gupta Sep 14 '17 at 17:18
0

So from my experience working with ANN and backpropagation

  1. Jacobian matrix organizes all the partial derivatives into an m x n matrix, Where m is the number of output and n is the number of input. So in your case it should be 2x3

  2. So let's say there is a set between 1 and k number of output (F in your picture) and there is 1 and i number of input (x in your picture) so the formula should be like this

            Fk
    Jki =  ----
            xi
    

Sorry I don't know how to write a formula format in here but I hope my answer is clear enough.
If you have any question about my answer please ask in the comment!

Niko Adrianus Yuwono
  • 11,012
  • 8
  • 42
  • 64
  • What should the F function look like? Also Abhinash in his answer suggested that the size of the matrix is different than what you proposed (if I understand him right). Maybe if I see the F function it'll be more clear. – Andrzej Gis Oct 01 '14 at 22:00
0
  1. You have an input vector X of shape (3,1) and a function G(x) that maps it to the output vector Y of shape (4,1) which is your hidden layer. This function G(x) is also called the weight matrix W and has a shape of (4,3). So in terms of matrix multiplication you have

    • Y = WX + b, b: bias and has the same shape as Y which is (4,1)

In this case the Jacobian of Y wrt X is your weight matrix W. W contains the gradient of each element in Y for each element in X

  • JX(Y) = W
  • (Additionally JW(Y) = XT)

2.The top left and bottom right corners are decided by the input and output vector dimensions

  • X = [X1,X2.....Xn]
  • Y = [Y1,Y2...Ym] or as in your question F = [F1,F2....Fm]

A matrix describing the derivative of each element of Y(F) wrt each element in X will have m*n number of elements in the shape of (m,n)

For detailed calculations and further readings, go through the following links

  1. http://cs231n.stanford.edu/handouts/linear-backprop.pdf
  2. http://cs231n.stanford.edu/handouts/derivatives.pdf
  3. https://web.stanford.edu/class/cs224n/readings/gradient-notes.pdf
adityassrana
  • 425
  • 4
  • 10