11

Is there anyway to use Rectified Linear Unit (ReLU) as the activation function of the hidden layer instead of tanh() or sigmoid() in Theano? The implementation of the hidden layer is as follows and as far as I have searched on the internet ReLU is not implemented inside the Theano.

class HiddenLayer(object):
  def __init__(self, rng, input, n_in, n_out, W=None, b=None, activation=T.tanh):
    pass
Amir
  • 10,600
  • 9
  • 48
  • 75
A.M.
  • 1,757
  • 5
  • 22
  • 41

5 Answers5

17

relu is easy to do in Theano:

switch(x<0, 0, x)

To use it in your case make a python function that will implement relu and pass it to activation:

def relu(x):
    return theano.tensor.switch(x<0, 0, x)
HiddenLayer(..., activation=relu)

Some people use this implementation: x * (x > 0)

UPDATE: Newer Theano version have theano.tensor.nnet.relu(x) available.

nouiz
  • 5,071
  • 25
  • 21
  • How is the non differentiability at zero dealt with here? – Chet Jan 20 '15 at 21:26
  • @nouiz I just installed Theano on my laptop. The library does not include nnet.relu. However, I can use nnet.relu on a desktop machine on which I installed Theano a few days back. What could be the reason? – Amir Jan 09 '16 at 20:45
  • @Amir, this is because they don't have the same Theano version. The one without relu use the last released Theano version 0.7, the one with relu use the development version (it is stable and we recommand people to use it): http://www.deeplearning.net/software/theano/install.html#bleeding-edge-install-instructions – nouiz Jan 14 '16 at 14:45
8

UPDATE: Latest version of theano has native support of ReLU: T.nnet.relu, which should be preferred over custom solutions.

I decided to compare the speed of solutions, since it is very important for NNs. Compared speed of function itself and it's gradient, in first case switch is preferred, the gradient is faster for x * (x>0). All the computed gradients are correct.

def relu1(x):
    return T.switch(x<0, 0, x)

def relu2(x):
    return T.maximum(x, 0)

def relu3(x):
    return x * (x > 0)


z = numpy.random.normal(size=[1000, 1000])
for f in [relu1, relu2, relu3]:
    x = theano.tensor.matrix()
    fun = theano.function([x], f(x))
    %timeit fun(z)
    assert numpy.all(fun(z) == numpy.where(z > 0, z, 0))

Output: (time to compute ReLU function)
>100 loops, best of 3: 3.09 ms per loop
>100 loops, best of 3: 8.47 ms per loop
>100 loops, best of 3: 7.87 ms per loop

for f in [relu1, relu2, relu3]:
    x = theano.tensor.matrix()
    fun = theano.function([x], theano.grad(T.sum(f(x)), x))
    %timeit fun(z)
    assert numpy.all(fun(z) == (z > 0)

Output: time to compute gradient 
>100 loops, best of 3: 8.3 ms per loop
>100 loops, best of 3: 7.46 ms per loop
>100 loops, best of 3: 5.74 ms per loop

Finally, let's compare to how gradient should be computed (the fastest way)

x = theano.tensor.matrix()
fun = theano.function([x], x > 0)
%timeit fun(z)
Output:
>100 loops, best of 3: 2.77 ms per loop

So theano generates inoptimal code for gradient. IMHO, switch version today should be preferred.

Alleo
  • 7,891
  • 2
  • 40
  • 30
  • 1
    Is this from [here](https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/scripts/benchmark/time_relu.py)? Note that when you care about the GPU speed, `T.maximum` is the fastest. See also [here](https://github.com/Theano/Theano/issues/2698). – Albert Mar 29 '15 at 13:58
  • @Albert, no, I decided to compare the versions I found here (unfortunately I don't have GPU, so these are CPU results). Thanks for the first link! – Alleo Mar 29 '15 at 18:10
  • Some follow up discussion about the speed is [here](https://github.com/Theano/Theano/issues/2698). – Albert Apr 14 '15 at 08:18
  • It seems, that relu is absent in theano.tensor.nnet According to http://deeplearning.net/software/theano/library/tensor/nnet/nnet.html#theano.tensor.nnet.relu It should be present in 0.7 But I have pip3 show theano --- Name: Theano Version: 0.7.0 Location: /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages and In [5]: theano.tensor.nnet.relu ----> 1 theano.tensor.nnet.relu AttributeError: 'module' object has no attribute 'relu' Has anybody faced with this? – Oleksandr Khryplyvenko Aug 29 '15 at 11:00
  • 1
    I've just got clue about it. If install theano using pip(even if install from scratch and use -I option), 0.7 stable version would be installed. But for now(at the moment I'm writing this comment) relu is absent there. But if install bleeding-edge version: pip3 install --upgrade --no-deps git+git://github.com/Theano/Theano.git Then relu appears in theano.tensor.nnet – Oleksandr Khryplyvenko Aug 29 '15 at 11:14
1

I think it is more precise to write it in this way:

x * (x > 0.) + 0. * (x < 0.)
  • 2
    `0. * (x < 0.)` will get optimized away. So the executed formula will be `x * (x > 0.)` – nouiz Oct 24 '14 at 20:35
1

I wrote it like this:

lambda x: T.maximum(0,x)

or:

lambda x: x * (x > 0)
grin
  • 99
  • 1
  • 8
0

The function is very simple in Python:

def relu(input):
    output = max(input, 0)
    return(output)
Not a machine
  • 508
  • 1
  • 5
  • 21