6

I am working on a generative chatbot based on seq2seq in Keras. I used code from this site: https://machinelearningmastery.com/develop-encoder-decoder-model-sequence-sequence-prediction-keras/

My models looks like this:

# define training encoder
encoder_inputs = Input(shape=(None, n_input))
encoder = LSTM(n_units, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]

# define training decoder
decoder_inputs = Input(shape=(None, n_output))
decoder_lstm = LSTM(n_units, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(n_output, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

# define inference encoder
encoder_model = Model(encoder_inputs, encoder_states)

# define inference decoder
decoder_state_input_h = Input(shape=(n_units,))
decoder_state_input_c = Input(shape=(n_units,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_inputs] + decoder_states_inputs [decoder_outputs] + decoder_states)

This neural network is designed to work with one hot encoded vectors, and input to this network seems for example like this:

[[[0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0.]
  [0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0.]]
  [[0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.
   0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.
   0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0.]]]

How can I rebuild these models to work with words? I would like to use word embedding layer, but I have no idea how to connect embedding layer to these models.

My input should be [[1,5,6,7,4], [4,5,7,5,4], [7,5,4,2,1]] where int numbers are representations of words.

I tried everything but I'm still getting errors. Can you help me, please?

Al.G.
  • 4,327
  • 6
  • 31
  • 56

2 Answers2

7

I finally done it. Here is the code:

Shared_Embedding = Embedding(output_dim=embedding, input_dim=vocab_size, name="Embedding")

encoder_inputs = Input(shape=(sentenceLength,), name="Encoder_input")
encoder = LSTM(n_units, return_state=True, name='Encoder_lstm') 
word_embedding_context = Shared_Embedding(encoder_inputs) 
encoder_outputs, state_h, state_c = encoder(word_embedding_context) 
encoder_states = [state_h, state_c] 
decoder_lstm = LSTM(n_units, return_sequences=True, return_state=True, name="Decoder_lstm")

decoder_inputs = Input(shape=(sentenceLength,), name="Decoder_input")
word_embedding_answer = Shared_Embedding(decoder_inputs) 
decoder_outputs, _, _ = decoder_lstm(word_embedding_answer, initial_state=encoder_states) 
decoder_dense = Dense(vocab_size, activation='softmax', name="Dense_layer") 
decoder_outputs = decoder_dense(decoder_outputs) 

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

encoder_model = Model(encoder_inputs, encoder_states) 

decoder_state_input_h = Input(shape=(n_units,), name="H_state_input") 
decoder_state_input_c = Input(shape=(n_units,), name="C_state_input") 
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c] 
decoder_outputs, state_h, state_c = decoder_lstm(word_embedding_answer, initial_state=decoder_states_inputs) 
decoder_states = [state_h, state_c] 
decoder_outputs = decoder_dense(decoder_outputs)

decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)

"model" is training model encoder_model and decoder_model are inference models

Chris Farr
  • 3,580
  • 1
  • 21
  • 24
1

Below in the FAQ section of this example, they provide an example on how to use embeddings with seq2seq. I'm currently figuring out the inference step myself. I'll post here when i get it. https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html

emericw
  • 13
  • 5
  • 2
    Yes, there is some example of word embedding, but i couldn't understand how to create inference model with embedding. Secondly, can you help me, how the input to this network should be alike? I am trying use input like this: [[1,5,6,7,4], [4,5,7,5,4], [7,5,4,2,1]], but i get error that network expect 3D shape. – Lukáš Richtarik Mar 26 '18 at 20:20
  • 1
    I'm battling the inference issue myself, i'll post it here when i get it. For the input shape problem, my input array is an list of articles filled with a list of integers (similar to your example). The input shape of my network looks like this: `encoder_inputs = Input(shape=(None, ))` `encoder_embedding = Embedding(amount_encoder_tokens, latent_dim)(encoder_inputs)` – emericw Mar 27 '18 at 16:39
  • Posting the full error and some code snippets might help me resolve the issue. – emericw Mar 27 '18 at 16:43
  • My decoder training model: ` decoder_inputs = Input(shape=(None,)) x = Embedding(MAX_SENTENCE_LENGTH, len(vocab) (decoder_inputs) x = LSTM(len(vocab), return_sequences=True)(x, initial_state=encoder_states) decoder_outputs = Dense(MAX_SENTENCE_LENGTH, activation='softmax')(x) ` My error: Error when checking target: expected dense_1 to have 3 dimensions, but got array with shape (10, 15) My input is list of integers: [[4,5,8,2,1,4,5,6,9,8,12,15,47,15,12], [....] ...] – Lukáš Richtarik Apr 03 '18 at 10:53
  • @emericw Did you figure out inference? – user2258651 Apr 22 '18 at 16:15
  • @LukášRichtarik Sorry for the late response. I see an error with your code, the dense layer shouldn't receive the max length of your sentence but the amount of decoder tokens. That being said, it doesn't explain your error. Although we are using the embedding layer we still need to use one-hot encoding for training. This is because the network cant predict a number in that range accurately on one output. So your decoder_target_data array should just be regular one-hot encoded while the encoder_input_data and the decoder_input_data can be a list of integers. – emericw Apr 23 '18 at 19:50
  • @user2258651 Yes i did! Feel free to ask any questions. – emericw Apr 23 '18 at 19:52
  • @emericw This is what I have for inference and it seems to work. Is it correct? https://pastebin.com/rEV7tMnd Also I'm getting the same output for every input. This despite training on ~4 million dialog pairs for a model with ~300k parameters and a vocab size of 5000. Any insight into what I might be doing wrong? – user2258651 Apr 23 '18 at 23:03
  • Not exactly, seems pretty much the same as mine. The only difference with my code is that i dont initialize a new embedding layer and then add weights to it, i pull the embedding layer with the weights from the model. Are u sure nothing is wrong with the training and that the weights are the same from the training model? – emericw Apr 24 '18 at 20:31
  • Would you mind posting a github gist of your full code? (or at least the parts where the model is defined, inference is performed and output is outputted?) I'm curious how you actually perform inference if you require the full sequence length in the inference decoder model. – user2258651 Apr 24 '18 at 22:37
  • To answer your questions the embedding matrix is the same one I use to initialize my training layer and they're both set to trainable false. I'm not sure training is right. In fact it seems that the problem is in the training and inference loop because without training I get variance in output but after training I don't – user2258651 Apr 24 '18 at 22:44
  • http://ronsoros.github.io/?3f6da796fe5e98b71f8b32b30981c2efa62c8cf8 Here you can see the code you asked for. I personally dont put the embedding layer's trainable to false, but is shouldn't make any difference. – emericw Apr 26 '18 at 17:03
  • @LukášRichtarik posted his example in the meanwhile, its very similar to mine. – emericw Apr 28 '18 at 14:46
  • Hm...it seems that there is some problem in my model. When i train this model with 1000 sentences, it works perfect. But when I train the model with 5000 sentences, accuracy is 99% but model predicts responses randomly. Any ideas where can be some mistake? – Lukáš Richtarik Apr 28 '18 at 16:39
  • Not exactly, i'm very much a beginner myself sorry. – emericw Apr 29 '18 at 09:37