22

Can someone explain to me the difference between activation and recurrent activation arguments passed in initialising keras lstm layer?

According to my understanding LSTM has 4 layers. Please explain what are th e default activation functions of each layer if I do not pass any activation argument to the LSTM constructor?

Mayank Uniyal
  • 221
  • 1
  • 2
  • 4

5 Answers5

22

On code

Line from 1932

i = self.recurrent_activation(z0)
f = self.recurrent_activation(z1)
c = f * c_tm1 + i * self.activation(z2)
o = self.recurrent_activation(z3)
h = o * self.activation(c)

recurrent_activation is for activate input/forget/output gate.

activation if for cell state and hidden state.

peikuo
  • 665
  • 6
  • 9
12

An LSTM Unit has 3 gates called the input, forget, and output gates, in addition to a candidate hidden state (g), and an output hidden state (c).

The build method in the LSTMCell class contains the implementation where these activations are called (https://github.com/keras-team/keras/blob/master/keras/layers/recurrent.py#L1892).

The recurrent_activation argument applies to the input, forget, and output gates. The default value for this argument is a hard-sigmoid function. The activation argument applies to the candidate hidden state and output hidden state. The default value for this argument is a hyperbolic tangent function.

4

So when a LSTM layer is called two kind of operations are performed:

  • inner recurrent activations compuations which actualizes inner memory cell - for this recurrent_activation is used (default value is a hard_sigmoid).
  • the final output of layer is computed. Here you are applying an activation function (default value is tanh).

Here you could read the details.

Marcin Możejko
  • 39,542
  • 10
  • 109
  • 120
  • 2
    I just read the article you shared. What I understood is there are four layers in a single LSTM block. 1. Forget layer which decides what to forget from the cell state. 2.Input gate layer which decides decides which values of our cell state we’ll update 3. tanh layer which creates a vector of new candidate values, that could be added to the state. 4.Finally a sigmoid layer which decide what we’re going to output. Now please could you tell me out of these four which are recurrent activation and which are normal activations. – Mayank Uniyal Jul 06 '17 at 14:03
0

According to the explanation by Andrew Ng in this video1 , the three gates namely update,forget and output gate require a sigmoid-type activation function.Hence the activation in keras documentation refers to these acativation values.

The activations required for the update candidate and the output are tanh. So the recurrent activations correspond to these activations in the Keras documentation.RA-Recurrent Activation, Act.-Activations

-3

I verified your question and below is my conclusion: - activation:tanh; - recurrent activation:sigmoid(default hard_sigmoid in keras); and for 4 gate in one cell: - i、f、o use recurrent activation; - C and update h use activation;

Dr.Wang
  • 7
  • 2