0

I want to implement the DA-RNN in this paper. The page 3 of the research paper talks about the encoder network. Specifically I want to implement the following equations :

enter image description here

I want to confirm that in the above equation I require cell state s at every timestep ? Usually in RNNs we just need the cell state from the previous timestep. So what I mean is if we have a batch of 5 timestamps we use the hidden state h at all timesteps from 1 to 5 and we use cell state s from only the 5th timestep (and not for all the timesteps).

The out,s = dynamic_rnn function in tensorflow also gives me this output where out will have hidden states computed at each timestep and s will be a tuple of c and h at the last timestep. For example if my input is of the size batch_size x max_timestep x num_features and lets say max_timestep=5 then s.c and s.h will consist of cell state and hidden state from the 5th timestep and not for timesteps 1,2,3,4. However in the paper the general notation is cell state s at time t-1. So if we are talking about 5 timesteps we need cell state s at timesteps 1,2,3,4 which I can't obtain thru dynamic_rnn function.

So to summarise

  1. is my assumption correct that I need cell state s for all the timesteps to implement above equation in the image ?
  2. If yes how can i do this in tensorflow ? Should I write my own LSTM wrapper ?

Update :

This answer resolved my issue. I turns out there is no direct function to obtain cell states at each timestep however we can wrap the RNN cell and just give it as input to the dynamicRNN function in tf.

Siddhant Tandon
  • 651
  • 4
  • 15
  • I can answer yes for the first question. I recommend you refer to the decoding/training/evaluate part of [Neural Machine Translation with Attention](https://www.tensorflow.org/beta/tutorials/text/nmt_with_attention) if you want to do this in tensorflow. – giser_yugang Jul 08 '19 at 13:53
  • @giser_yugang thanks for the reply. I went through the link and found out that the type of attention in `seq2seq` models don't use the cell state `c` in any of the computations. In their notations `c` means context vector. So in the end i guess everyone gives different kind of inputs to an `mlp` net which they call attention model. Anyway i think there is no way to get `c` which is cell state vector at everytimestep. – Siddhant Tandon Jul 09 '19 at 08:55

0 Answers0