While trying to implement neural network training algorithms, I came across different concepts, including that of Gradient descent which tries to mimic a ball rolling down a hill, and of velocity and momentum to better model the rolling ball.
I initialized my weights
, weight_deltas
, and weight_velocities
thus:
sizes = [2, 3, 1]
momentum_coefficient = 0.5
weights = [ 2 * np.random.random((a, b)) - 1 for a, b in zip(sizes[:-1], sizes[1:]) ]
weight_velocities = [ np.ones(w.shape) for w in weights ]
weight_deltas = [ np.zeros(w.shape) for w in weights ]
After calculating the deltas (derivative of the cost function with respect to the weights, I updated the weights thus:
for l in xrange(sizes - 1):
weight_velocities[l] = (momentum_factor * weight_velocities[l]) - weight_deltas[l]
weights[l] += weight_velocities[l]
I used np.zeros
to initialise my velocities, and I was able to get up to 80% accuracy (for a particular dataset). But when I initialised with np.ones
, I could not get up to 20% accuracy. I've been using ones
, but I can't figure out why zeros
does work. And there's also the random
method from numpy
.
What's the recommended approach to initialise the weight_velocities
?
Notice that I intentionally excluded the biases units, and the learning rate, and I'm import
ing numpy as np
.