I'm implementing a neural network, and wanted to use ReLU as the activation function of the neurons. Furthermore, I'm training the network with SDG and back-propagation. I'm testing the neural network with the paradigmatic XOR problem, and up to now, it classifies new samples correctly if I use the logistic function or the hyperbolic tangent as activation functions.
I've been reading about the benefits of using the Leaky ReLU as activation function, and implemented it, in Python, like this:
def relu(data, epsilon=0.1):
return np.maximum(epsilon * data, data)
where np
is the name for NumPy. The associated derivative is implemented like this:
def relu_prime(data, epsilon=0.1):
if 1. * np.all(epsilon < data):
return 1
return epsilon
Using this function as activation I get incorrect results. For example:
Input = [0, 0] --> Output = [0.43951457]
Input = [0, 1] --> Output = [0.46252925]
Input = [1, 0] --> Output = [0.34939594]
Input = [1, 1] --> Output = [0.37241062]
It can be seen that the outputs differ greatly from the expected XOR ones. So the question would be, is there any special consideration to use ReLU as activation function?
Please, don't heasitate to ask me for more context or code. Thanks in advance.
EDIT: there is a bug in the derivative, as it only returns a single float value, and not a NumPy array. The correct code should be:
def relu_prime(data, epsilon=0.1):
gradients = 1. * (data > epsilon)
gradients[gradients == 0] = epsilon
return gradients