8

This is the API I am looking at, https://pytorch.org/docs/stable/nn.html#gru

It outputs:

  1. output of shape (seq_len, batch, num_directions * hidden_size)
  2. h_n of shape (num_layers * num_directions, batch, hidden_size)

For GRU with more than one layers, I wonder how to fetch the hidden state of the last layer, should it be h_n[0] or h_n[-1]?

What if it's bidirectional, how to do the slicing to obtain the last hidden layer states of GRUs in both directions?

kmario23
  • 57,311
  • 13
  • 161
  • 150
zyxue
  • 7,904
  • 5
  • 48
  • 74

1 Answers1

2

The documentation nn.GRU is clear about this. Here is an example to make it more explicit:

For the unidirectional GRU/LSTM (with more than one hidden layer):

output - would contain all the output features of all the timesteps t
h_n - would return the hidden state (at last timestep) of all layers.

To get the hidden state of the last hidden layer and last timestep, use:

first_hidden_layer_last_timestep = h_n[0]
last_hidden_layer_last_timestep = h_n[-1]

where n is the sequence length.


This is because description says:

num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two GRUs together to form a stacked GRU, with the second GRU taking in outputs of the first GRU and computing the final results.

So, it is natural and intuitive to also return the results (i.e. hidden states) accordingly in the same order.

kmario23
  • 57,311
  • 13
  • 161
  • 150
  • 1
    I think what you were thinking is `output`, "tensor containing the output features h_t from the last layer of the GRU". `h_n` only contains the hidden state at last timestep of all hidden layers. Note the dimensions of these tensors. – zyxue Jan 18 '19 at 04:32
  • 1
    You're right, I meant to write output; I have now updated the answer. Thanks for correcting me! – kmario23 Jan 18 '19 at 05:06
  • 1
    It wasn't obvious to me. How do you if it's not the other way, first layer is e.g. `h_n[-1]`? – zyxue Jan 18 '19 at 05:09
  • I have added some explanation based on the input parameters description from the docs. +1 – kmario23 Jan 18 '19 at 05:17
  • 2
    Intuition could be wrong, I confirmed it myself as the output of last layer of GRU hidden is supposed to be equal to the `output` at the last step – zyxue Jan 18 '19 at 05:21
  • @zyxue good! I want to ask a follow-up question. For example, for a two layer GRU, if we get the hidden state as a tensor of shape `torch.Size([2, 1, 1500])`, say for the last hidden layer. How can we get a vector out of this? Should we reshape first and then take a mean to get a 1D vector? – kmario23 Jan 18 '19 at 05:28
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/186900/discussion-between-zyxue-and-kmario23). – zyxue Jan 18 '19 at 05:30