1

I'm trying to create a chatbot using an RNN in TensorFlow, using this introduction https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html

The model in the example is a character based sequence, but I want to do a word-level model. The tutorial has a tiny bit of info in the "Bonus FAQ" section on how to modify the model to make it word-level. I am using GloVe pretrained word embeddings.

My model looks like this:

emb_dimension = 100

# Set up embedding layer using pretrained weights
embedding_layer = Embedding(total_words, emb_dimension, input_length=max_input_len, weights=[embedding_matrix], name="Embedding")

# Set up input sequence
encoder_inputs = Input(shape=(None,))
x = embedding_layer(encoder_inputs)
encoder_lstm = LSTM(100, return_state=True)
x, state_h, state_c = encoder_lstm(x)
encoder_states = [state_h, state_c]

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
x = embedding_layer(decoder_inputs)
decoder_lstm = LSTM(100, return_sequences=True)
decoder_lstm(x, initial_state=encoder_states)

decoder_outputs = Dense(total_words, activation='softmax')(x)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

It seems to train fine, but I don't know how to use this model to process new text. The tutorial has an inference example, but this has not been modified for a word-level model, and I can't figure out how to do it. Particularly this bit in the example:

encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
    decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

I tried modifying this code to add an embedding layer x = embedding_layer(decoder_inputs) and then use x for the input to the decoder lstm, but I get an error: TypeError: Cannot iterate over a Tensor with unknown first dimension.

How do I set up an inference model?

gazm2k5
  • 449
  • 5
  • 14

1 Answers1

0

Writing a decoder for inference is not that easy. First of all you have to understand that your current decoder uses teacher-forcing (meaning that the correct token of the previous position is given for the prediction at the next timestep). For inference you would eather need the greedy algorithm or beam search. Those steps are desrcirbed in this tutorial: https://www.tensorflow.org/addons/tutorials/networks_seq2seq_nmt

Sören Etler
  • 288
  • 1
  • 4
  • 13