Gradient descent : should delta value be scalar or vector?

Question

When computing the delta values for a neural network after running back propagation :

the value of delta(1) will be a scalar value, it should be a vector ?

Update :

Taken from http://www.holehouse.org/mlclass/09_Neural_Networks_Learning.html

Specifically :

any reference for the formula? – greeness May 12 '16 at 19:42 — greeness, May 12 '16 at 19:42
@greeness please see update – blue-sky May 12 '16 at 19:51 — blue-sky, May 12 '16 at 19:51

score 1 · Accepted Answer · answered May 12 '16 at 20:31

First, you probably understand that in each layer, we have n x m parameters (or weights) that needs to be learned so it forms a 2-d matrix.

n is the number of nodes in the current layer plus 1 (for bias)
m is the number of nodes in the previous layer.

We have n x m parameters because there is one connection between any of the two nodes between the previous and the current layer.

I am pretty sure that Delta (big delta) at layer L is used to accumulate partial derivative terms for every parameter at layer L. So you have a 2D matrix of Delta at each layer as well. To update the i-th row (the i-th node in the current layer) and j-th column (the j-th node in the previous layer) of the matrix,

D_(i,j) = D_(i,j) + a_j * delta_i
note a_j is the activation from the j-th node in previous layer,
     delta_i is the error of the i-th node of the current layer
so we accumulate the error proportional to their activation weight.

Thus to answer your question, Delta should be a matrix.

thanks but my question is why is a scalar is being outputted instead of a Matrix as error * (a)transpose is a scala. perhaps the link I pointed to is incorrect ? — blue-sky, May 12 '16 at 21:00
error is nx1 and transpose of a is 1xm so the product is nxm. you probably calculated with (1xn) x (nx1) so it becomes a scalar. — greeness, May 12 '16 at 21:02

Gradient descent : should delta value be scalar or vector?

1 Answers1