Tensorflow Stacked GRU Cell

Question

I am trying to implement a stacked RNN with MultiRNNCell and GRUCell in tensorflow.

From the default implementation of GRUCell, it can be seen that the "output" and the "state" of the GRUCell are the same:

class GRUCell(RNNCell)
  ...
  def call(self, inputs, state):
    ...
    new_h = u * state + (1 - u) * c
    return new_h, new_h

This make sense, as it is consistent with the definition. However, when we stack them with MultiRNNCell, which is defined as:

class MultiRNNCell(RNNCell):
  ...
  def call(self, inputs, state):
    ...
    cur_inp = inputs
    new_states = []
    for i, cell in enumerate(self._cells):
      # set cur_state = states[i] ...
      cur_inp, new_state = cell(cur_inp, cur_state)
      new_states.append(new_state)
    return cur_inp, new_states

(The codes have been condensed to highlight the relevant bits)

In this case, any GRUCell that is not the first one receives identical values for "inputs" and "states". Essentially, it's operating on a single input, which is the output from the previous layer.

Since the value of reset/update gates are dependent on the comparison of the two input values (input/state), wouldn't this end up being a redundant operation, that would simply end up passing the values through straight from the first layer?

It seems that this architecture for MultiRNNCell was mainly designed with LSTM Cells in mind, as they keep their outputs and cell states separate, but is not appropriate for GRU Cells.

Tensorflow Stacked GRU Cell

0 Answers0