performing stochastic gradient descent on a neural network

Question

I want to perform SGD on the following neural network:

Training set size = 200000
input layer size = 784
hidden layer size = 50
output layer size = 10

I have an algorithm that performs batch gradient descent.I guess to perform SGD , the cost function should be modified to perform calculations on single training data(array of size 784) and then theta should be updated for each training data. Is it the correct way of implementing SGD ?If yes, I am not able to get the following cost function(for batch gradient descent) to work for single training data.How can I make it run on a single training set ? If no, then what is the correct way to implement SGD on a neural network ?

python function to calculate cost function and gradient of theta for batch gradient descent :

 def cost(theta,X,y,lamb):
   #get theta1 and theta2 from unrolled theta vector
     th1 = (theta[0:(hiddenLayerSize*(inputLayerSize+1))].reshape((inputLayerSize+1,hiddenLayerSize))).T

     th2 = (theta[(hiddenLayerSize*(inputLayerSize+1)):].reshape((hiddenLayerSize+1,outputLayerSize))).T
#matrices to store gradient of theta1 &theta2     
     th1_grad = np.zeros(th1.shape)
     th2_grad = np.zeros(th2.shape)
     I = np.identity(outputLayerSize,int)
     Y = np.zeros((realTrainSetSize ,outputLayerSize))
    #get Y[i] to the size of output Layer
     for i in range(0,realTrainSetSize ):
      Y[i] = I[y[i]]
     #add bais unit in each training example and perform forward prop and backprop
     A1 = np.hstack([np.ones((realTrainSetSize ,1)),X])

     Z2 = A1 @ (th1.T)
     A2 = np.hstack([np.ones((len(Z2),1)),sigmoid(Z2)])
     Z3 = A2 @ (th2.T)
     H = A3 = sigmoid(Z3)
     penalty = (lamb/(2*trainSetSize))*(sum(sum(np.delete(th1,0,1)**2))+ sum(sum(np.delete(th2,0,1)**2)) )

     J = (1/2)*sum(sum( np.multiply(-Y,log(H)) - np.multiply((1-Y),log(1-H)) ))
 #backprop
     sigma3 = A3 - Y;
     sigma2 = np.multiply(sigma3@theta2,sigmoidGradient(np.hstack([np.ones((len(Z2),1)),Z2])))
     sigma2 = np.delete(sigma2,0,1)
     delta_1 = sigma2.T @ A1 #getting dimension mismatch error
     delta_2 = sigma3.T @ A2
 #calculation of gradient of theta1 and theta2
     th1_grad = np.divide(delta_1,trainSetSize)+(lamb/trainSetSize)*(np.hstack([np.zeros((len(th1),1)) , np.delete(th1,0,1)]))
     th2_grad = np.divide(delta_2,trainSetSize)+(lamb/trainSetSize)*(np.hstack([np.zeros((len(th2),1)) , np.delete(th2,0,1)]))
     #unroll gradients of theta1 and theta2
     theta_grad = np.concatenate(((th1_grad.T).ravel(),(th2_grad.T).ravel()))
     return (J,theta_grad)

I am getting dimension mismatch error while calculating delta_1 and delta_2 on calling this function with single training data but it works fine when called with entire training batch.

why your code is not working for single training data? what is the problem? — Wasi Ahmad, Nov 01 '16 at 02:58
What isn't working? Is there an error, is the output not what you expected? Contrary to popular belief, this is not a debugging service. Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. — juanpa.arrivillaga, Nov 01 '16 at 03:00
I am getting matrix dimension mismatch error when trying to calculate `delta_1` and `delta_2` — Saksham, Nov 01 '16 at 03:02

performing stochastic gradient descent on a neural network

0 Answers0