0

There are many resources on how to obtain c and h from your LSTM. It has both returns set to true, and I'm storing h and c in variables as you can see in the code below. This is during inference after training the network. The encoder half of my model should be spitting out h and c with length hidden_dim, but as you'll see below that is not the case:

    self.enc1 = LSTM(hidden_dim, return_sequences=True, return_state=True, name='enc1')
    self.emb_A = Embedding(output_dim = emb_dim, input_dim = vocabulary, input_length = None, mask_zero=True, name='embA')
               ...
    inference_encInput = Input(shape=(None,), dtype="int32", name="input_seq")

    temp = self.emb_A(inference_encInput)
    temp, h, c = self.enc1(temp)
    encoder_states = [h, c]
    enc_model = Model(inputs = inference_encInput, outputs = encoder_states)
        ...
    predicted_states = enc_model.predict(indices_arr)
    print("encoder states are "+str(predicted_states))

The result of that print statement is:

    encoder states are [array([[-0.3114952 , -0.19627409],
   [ 0.16007528,  0.72028404],
   [-0.7607165 ,  0.5128824 ]], dtype=float32), array([[-0.8645954 , -0.90217674],
   [ 0.31057465,  0.9236232 ],
   [-0.99791354,  0.99934816]], dtype=float32)]

My hidden dimension is only 2 because I'm just doing basic testing on extremely simple training data. The number of 2 dimensional vectors is always the same as the length of the sequence I'm trying to encode, in this case 3, which suggests that I'm somehow getting a sequence of states maybe? But h and c are just supposed to be the FINAL hidden and cell state. I don't think just taking the last one is correct either, there must be something else going on. I have no idea what I'm doing wrong here, especially since the state of the decoder is obtained correctly on each timestep:

    new states for decoder are [array([[ 0.19158483, -0.16113694]], dtype=float32), array([[ 0.19398187, -0.37419504]], dtype=float32)]
Sean Paulsen
  • 55
  • 1
  • 7
  • You can checkout this post: https://machinelearningmastery.com/return-sequences-and-return-states-for-lstms-in-keras/ – MBT Aug 08 '18 at 21:46
  • yes I have done that exactly, as far as I can tell, that's why I'm here, the states still aren't right – Sean Paulsen Aug 08 '18 at 22:05

1 Answers1

0

If anyone ever sees this post, I figured it out. I was passing to model.predict() as a list, something like model.predict([1 15 14 2]), but it thinks that's a list of the things I want a prediction for. For an lstm we want to send in all four timesteps for a prediction, so it needed to be a list of a list, as in model.predict([[1 15 14 2]]).

Sean Paulsen
  • 55
  • 1
  • 7