I implemented a deep learning neural network from scratch without using any python frameworks like tensorflow or keras.
The problem is no matter what i change in my code like adjusting learning rate or changing layers or changing no. of nodes or changing activation functions from sigmoid to relu to leaky relu, i end up with a training loss that starts with 6.98 but always converges to 3.24...
Why is that?
Please review my forward and back prop algorithms.Maybe there's something wrong in that which i couldn't identify.
My hidden layers use leaky relu and final layer uses sigmoid activation. Im trying to classify the mnist handwritten digits.
code:
#FORWARDPROPAGATION
for i in range(layers-1):
cache["a"+str(i+1)]=lrelu((np.dot(param["w"+str(i+1)],cache["a"+str(i)]))+param["b"+str(i+1)])
cache["a"+str(layers)]=sigmoid((np.dot(param["w"+str(layers)],cache["a"+str(layers-1)]))+param["b"+str(layers)])
yn=cache["a"+str(layers)]
m=X.shape[1]
cost=-np.sum((y*np.log(yn)+(1-y)*np.log(1-yn)))/m
if j%10==0:
print(cost)
costs.append(cost)
#BACKPROPAGATION
grad={"dz"+str(layers):yn-y}
for i in range(layers):
grad["dw"+str(layers-i)]=np.dot(grad["dz"+str(layers-i)],cache["a"+str(layers-i-1)].T)/m
grad["db"+str(layers-i)]=np.sum(grad["dz"+str(layers-i)],1,keepdims=True)/m
if i<layers-1:
grad["dz"+str(layers-i-1)]=np.dot(param["w"+str(layers-i)].T,grad["dz"+str(layers-i)])*lreluDer(cache["a"+str(layers-i-1)])
for i in range(layers):
param["w"+str(i+1)]=param["w"+str(i+1)] - alpha*grad["dw"+str(i+1)]
param["b"+str(i+1)]=param["b"+str(i+1)] - alpha*grad["db"+str(i+1)]