0

I'm building a neural network with the architecture:

input layer --> fully connected layer --> ReLU --> fully connected layer --> softmax

I'm using the equations outlined here DeepLearningBook to implement backprop. I think my mistake is in eq. 1. When differentiating do I consider each example independently yielding an N x C (no. of examples x no. of classes) matrix or together to yield an N x 1 matrix?

# derivative of softmax
da2 = -a2    # a2 comprises activation values of output layer
da2[np.arange(N),y] += 1
da2 *= (a2[np.arange(N),y])[:,None]

# derivative of ReLU
da1 = a1    # a1 comprises activation values of hidden layer
da1[a1>0] = 1

# eq. 1
mask = np.zeros(a2.shape)
mask = [np.arange(N),y] = 1
delta_2 = ((1/a2) * mask) * da2 / N 
# delta_L = - (1 / a2[np.arange(N),y])[:,None] * da2 / N

# eq.2
delta_1 = np.dot(delta_2,W2.T) * da1

# eq. 3
grad_b1 = np.sum(delta_1,axis=0)
grad_b2 = np.sum(delta_2,axis=0)

# eq. 4
grad_w1 = np.dot(X.T,delta_1)
grad_w2 = np.dot(a1.T,delta_2)

Oddly, the commented line in eq. 1 returns the correct value for biases but I can't seem to justify using that equation since it returns an N x 1 matrix which is multiplied with the corresponding rows of da2.

Edit: I'm working on the assignment problems of the CS231n course which can be found here: CS231n

  • Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation. [Minimal, complete, verifiable example](http://stackoverflow.com/help/mcve) applies here. We cannot effectively help you until you post your MCVE code and accurately describe the problem. Specifically, please include the "required" code to illustrate the problem: your posted code should run as given, by itself. – Prune Dec 27 '16 at 18:48
  • Also, what is your purpose in doing this? There are several good frameworks that supply the desired layers and connections for you, with backprop included. – Prune Dec 27 '16 at 18:49
  • It's primarily for learning. – inSearchofAnswers Dec 27 '16 at 20:18

1 Answers1

0

I also couldn't find any explanation about this elsewhere. So I write a post :) Please read it here.

Nimit Pattanasri
  • 1,602
  • 1
  • 26
  • 37