I am trying to implement a 3 layer neural network with feedforward and backpropagation.
I have tested my cost function and it is working fine. My gradient function also seems ok.
but when I try to optimize variable using fmin_cg
from scipy
, I get this warning :
Warning: Desired error not necessarily achieved due to precision loss.
Current function value: 4.643489
Iterations: 1
Function evaluations: 123
Gradient evaluations: 110
I searched about this and someone told the problem is with gradient
. This is my code for gradient:
theta_flatten = theta_flatten.reshape(1,-1)
# retrieve theta values from flattened theta
theta_hidden = theta_flatten[0,0:((input_layer_size+1)*hidden_layer_size)]
theta_hidden = theta_hidden.reshape((input_layer_size+1),hidden_layer_size)
theta_output = theta_flatten[0,((input_layer_size+1)*hidden_layer_size):]
theta_output = theta_output.reshape(hidden_layer_size+1,num_labels)
# start of section 1
a1 = x # 5000x401
z2 = np.dot(a1,theta_hidden) # 5000x25
a2 = sigmoid(z2)
a2 = np.append(np.ones(shape=(a1.shape[0],1)),a2,axis = 1) # 5000x26 # adding column of 1's to a2
z3 = np.dot(a2,theta_output) # 5000x10
a3 = sigmoid(z3) # a3 = h(x) w.r.t theta
a3 = rotate_column(a3) # mapping 0 to "0" instead of 0 to "10"
# end of section 1
# strat of section 2
delta3 = a3 - y # 5000x10
# end of section 2
# start of section 3
delta2 = (np.dot(delta3,theta_output.transpose()))[:,1:] # 5000x25 # drop delta2(0)
delta2 = delta2*sigmoid_gradient(z2)
# end of section 3
# start of section 4
DELTA2 = np.dot(a2.transpose(),delta3) # 26x10
DELTA1 = np.dot(a1.transpose(),delta2) # 401x25
# end of section 4
# start of section 5
theta_hidden_ = np.append(np.ones(shape=(theta_hidden.shape[0],1)),theta_hidden[:,1:],axis = 1) # regularization
theta_output_ = np.append(np.ones(shape=(theta_output.shape[0],1)),theta_output[:,1:],axis = 1) # regularization
D1 = DELTA1/a1.shape[0] + (theta_hidden_*lambda_)/a1.shape[0]
D2 = DELTA2/a1.shape[0] + (theta_output_*lambda_)/a1.shape[0]
# end of section 5
Dvec = np.append(D1,D2)
return Dvec
I look at github for other people implementations, but nothing works, and they implemented like me.
some comments :
Section one: feedforward implementation
Section two to four: backpropagation from ouput layer to input layer
Section five: aggregating gradients
Please help
Thank you