I have implemented back-propagation for an MLP using the sigmoid activation function.
During the forward phase I store the output from each layer in memory.
After calculating the output error and output gradient vector I start to go back in reverse and calculate the hidden error for each layer (using output from current layer + weight from layer +1 + output error from layer +1). I then use the hidden error and output from layer -1 to calculate the gradient vector. Once back-propagation is complete I update the weights using the calculated gradient vectors for each layer.
My question is related to the implementation of the relu activation function. I have the following functions for applying activation functions. The first is the one I used in the initial run and the second is for the relu activation.
def sigmoid(self, a):
o = 1/(1+np.exp(-1*a))
return o
def relu(self, a):
return np.maximum(0, a)
def reluDerivative(self, x):
return 1. * (x > 0)
To implement the relu activation function do I need to make any other changes during forward phase or back-propagation phase. I read that I might need to calculate relu derivative during the backward phase and apply but am confused by how this applies. appreciate any advice