1

I am trying to understand the char lstm example mentioned here - char-lstm julia example

Function lstm_cell accepts the second parameter as previous state -
function lstm_cell(data::mx.SymbolicNode, prev_state::LSTMState, param::LSTMParam;num_hidden::Int=512, dropout::Real=0, name::Symbol=gensym())
However, in the section - #stack LSTM cells

next_state = lstm_cell(hidden, l_state, l_param, num_hidden=dim_hidden, dropout=dp,name=Symbol(name, "lstm$t"))
hidden = next_state.h
layer_param_states[i] = (l_param, next_state)

layer_param_states[i] gets updated with the next state- layer_param_states[i] = (l_param, next_state)
why is this done here. Why is the previous state being updated with the next state.

Abhishek Kishore
  • 340
  • 2
  • 13

1 Answers1

1

Because layer_param_states stores the final states of the sequence. Note in https://github.com/dmlc/MXNet.jl/blob/master/examples/char-lstm/lstm.jl#L110 the final state is grouped and will be used to make loss with provided labels.

Just FYI, the python example does exactly the same thing: https://github.com/apache/incubator-mxnet/blob/master/example/rnn/old/lstm.py#L167 . The name last_states makes more sense.

Yizhi
  • 21
  • 2