I'm trying to create a chatbot using an RNN in TensorFlow, using this introduction https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html
The model in the example is a character based sequence, but I want to do a word-level model. The tutorial has a tiny bit of info in the "Bonus FAQ" section on how to modify the model to make it word-level. I am using GloVe pretrained word embeddings.
My model looks like this:
emb_dimension = 100
# Set up embedding layer using pretrained weights
embedding_layer = Embedding(total_words, emb_dimension, input_length=max_input_len, weights=[embedding_matrix], name="Embedding")
# Set up input sequence
encoder_inputs = Input(shape=(None,))
x = embedding_layer(encoder_inputs)
encoder_lstm = LSTM(100, return_state=True)
x, state_h, state_c = encoder_lstm(x)
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
x = embedding_layer(decoder_inputs)
decoder_lstm = LSTM(100, return_sequences=True)
decoder_lstm(x, initial_state=encoder_states)
decoder_outputs = Dense(total_words, activation='softmax')(x)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
It seems to train fine, but I don't know how to use this model to process new text. The tutorial has an inference example, but this has not been modified for a word-level model, and I can't figure out how to do it. Particularly this bit in the example:
encoder_model = Model(encoder_inputs, encoder_states)
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs] + decoder_states)
I tried modifying this code to add an embedding layer x = embedding_layer(decoder_inputs)
and then use x
for the input to the decoder lstm, but I get an error: TypeError: Cannot iterate over a Tensor with unknown first dimension.
How do I set up an inference model?