How to implement ReLU in place of the Sigmoid Function

Question

import numpy as np 

alpha = 0.0251 # as close to true alpha as possible
def nonlinear(x, deriv=False):
  if(deriv==True):
    return x*(1-x)
  return 1/(1+np.e**(-x))

#seed
np.random.seed(1)

#testing sample
test_x = np.array([[251,497,-246],
              [299,249,50],
              [194,180,14],
              [140,148,-8],
              [210,140,70]])
#Input Array - This input will be taken directly from a Pong game
X = np.array([[198,200,-2],
          [90, 280,-190],
          [84, 256,-172],
          [140,240,-100],
          [114,216,-102],
          [72, 95,-23],
          [99, 31, 68],
          [144, 20, 124],
          [640, 216,424],
          [32, 464,-432],
          [176, 64,112],
          [754, 506,248],
          [107, 104,3],
          [116,101,15]])

#output array - if ball_pos - paddle > 0 move up else move down
Y = np.array([[0,0,0,0,0,0,1,1,1,0,1,1,1,1,]]).T 

syn0 = 2*np.random.random((3,14))-1
syn1 = 2*np.random.random((14,14))-1

for j in range(60000):

  #forward propagation 
  l0 = X
  l1 = nonlinear(np.dot(l0, syn0))
  l2 = nonlinear(np.dot(l1, syn1))

  #how much did we miss 
  l2_error = Y - l2 

  #multiply how much missed by the slope of sigmoid at the value in l1 
  l2_delta = l2_error * nonlinear(l2, True)

  #how much did l1 contribute to l2 error 
  #(according to the weights)
  l1_error = l2_delta.dot(syn1.T)

  #in what direction is the target l1?
  # Sure?
  l1_delta = l1_error*nonlinear(l1,True)

  #update weight
  syn1 += alpha * (l1.T.dot(l2_delta))
  syn0 += alpha * (l0.T.dot(l1_delta))

  # display error 
  if(j % 10000) == 0:
    print("ERROR: " + str(np.mean(np.abs(l2_error))))


#Testing Forward propagation
l0_test = test_x
l1_test = nonlinear(np.dot(l0_test,syn0))
l2_test = nonlinear(np.dot(l1_test,syn1))

#Dress up the array (make it look nice)
l2_test_output = []
for x in range(len(l2_test)):
  l2_test_output.append(l2_test[x][0])

print("Test Output")
print(l2_test_output)

#Put all the l2 data in a way I could see it: Just the first probabilites 
l2_output = []
for x in range(len(l2)):
  l2_output.append(l2[x][0])

print("Output")
print(l2_output)

This code is supposed to take in a group of three numbers [(value_1),(value_2),(value_1-value_2)] and return either a "0" if the difference between the first and second value is negative or a "1" if the difference is positive. So far it actually works very well.

Here is the output: ERROR: 0.497132186092 ERROR: 0.105081486632 ERROR: 0.102115299177 ERROR: 0.100813655802 ERROR: 0.100042420179 ERROR: 0.0995185781466 Test Output [0.0074706006801269686, 0.66687458928464094, 0.66687458928463983, 0.66686236694464551, 0.98341439176739631] Output [0.66687459245609326, 0.00083944690766060215, 0.00083946471285455484, 0.0074706634783305243, 0.0074706634765733968, 0.007480987498372226, 0.99646513183073093, 0.99647100131874755, 0.99646513180692531, 0.00083944572383107523, 0.99646513180692531, 0.98324165810211861, 0.66687439729829612, 0.66687459321626519] ERROR: 0.497132186092

As you can see the error given the alpha = 0.0251 (for gradient descent - found this through trial and error) is only about 9.95 %.

Since I made this program, I've learned that leaky RelU is a better alternative to the Sigmoid function since it optimizes and learns faster than the Sigmoid. I want to implement the leaky RelU function using numpy in this program but I'm not sure of where to start and more particularly what its derivative is.

How can I implement leaky RelU into this neural net?

it optimizes and learns faster than sigmoid only under specific conditions (it lacks some sigmoid drawbacks but has it's own, e.g. the so called "dead relu" problem etc etc, it's all much more complicated), also if you need your net to return values between 0 and 1 you will need sigmoid or it's close alternative anyway, as relu is unbounded. If you want to design neural nets yourself, I'd start somewhere around here: https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b — Bob, Nov 29 '17 at 14:56
Thank you for that information. I would like my net to return values between 0 and 1 because I need it to return classical probabilities. I have a question about the ReLU function, why would someone need a function that doesn't output a value between 0 and 1 and does that mean that the sigmoid and ReLU functions aren't interchangeable? Also I might have accidentally flagged your comment. — Ikechukwu Anude, Nov 30 '17 at 23:28

score 1 · Accepted Answer · answered Oct 12 '17 at 11:59

I'd like to add here that there is actually a wide range of ReLu-like activation functions that can be used instead of standard ReLu activation:

You've mentioned Leaky ReLu yourself (parametrized by alpha).
Parametric Rectified Linear Unit (PReLU). The formula is the same as Leaky ReLu, but allows the coefficient alpha to be learned. See also this discussion.
Exponential linear unit (ELU), which try to make the mean activations closer to zero which speeds up learning:

Scaled exponential linear unit (SELU) has been published very recently. It's an extension of ELU, with a specific choice of parameter, which has an additional normalizing effect and helps to learn faster.

Here's the list of all activations and their derivatives.

How to implement ReLU in place of the Sigmoid Function

1 Answers1