I want to implement the DA-RNN in this paper. The page 3 of the research paper talks about the encoder network. Specifically I want to implement the following equations :
I want to confirm that in the above equation I require cell state s
at every timestep ? Usually in RNNs we just need the cell state from the previous timestep. So what I mean is if we have a batch of 5 timestamps we use the hidden state h
at all timesteps from 1 to 5
and we use cell state s
from only the 5th timestep (and not for all the timesteps).
The out,s = dynamic_rnn
function in tensorflow also gives me this output where out
will have hidden states computed at each timestep and s
will be a tuple of c
and h
at the last timestep. For example if my input is of the size batch_size x max_timestep x num_features
and lets say max_timestep=5
then s.c
and s.h
will consist of cell state and hidden state from the 5th timestep and not for timesteps 1,2,3,4
. However in the paper the general notation is cell state s
at time t-1
. So if we are talking about 5 timesteps we need cell state s
at timesteps 1,2,3,4
which I can't obtain thru dynamic_rnn
function.
So to summarise
- is my assumption correct that I need cell state
s
for all the timesteps to implement above equation in the image ? - If yes how can i do this in tensorflow ? Should I write my own LSTM wrapper ?
Update :
This answer resolved my issue. I turns out there is no direct function to obtain cell states at each timestep however we can wrap the RNN cell and just give it as input to the dynamicRNN
function in tf.