0

I need to run a encoder-decoder model in Tensorflow. I see that using the available APIs basic_rnn_seq2seq(encoder_input_data, decoder_input_data, lstm_cell) etc, a encoder-decoder system can be created.

  1. How can we enter the embeddings such as word2vec in such model? I am aware that we can do embedding lookup but as per the API encoder_input_data is a list of 2D Tensor of size batch_size x input_size. How can each word be represented using its respective word embedding in this setup? Even embedding_rnn_seq2seq internally extracts the embeddings. How to give pre-calculated word embeddings as input?
  2. How can we get the cost/perplexity through the API?
  3. In case of test instances, we may not know the corresponding decoder inputs. How to handle such case?
user3480922
  • 564
  • 1
  • 10
  • 22

1 Answers1

3

First question: Probably not the best way, but what I did was, after building the model, before training starts:

for v in tf.trainable_variables():
  if v.name == 'embedding_rnn_seq2seq/RNN/EmbeddingWrapper/embedding:0':
    assign_op = v.assign(my_word2vec_matrix)
    session.run(assign_op)  # or `assign_op.op.run()`

my_word2vec_matrix is a matrix of shape vocabularysize x embedding size and filled in my precomputed embedding-vectors. Use this (or something similar) if you believe your embeddings are really good. Otherwise the seq2seq-Model, over time, will come up with its own trained embedding.

Second question: In seq2seq.py there is a call to model_with_buckets() which you can find in python/ops/seq2seq.py. From there the loss is returned.

Third question: In the test case each decoder input is the decoder output from the timestep before (i.e. the first decoder input is a special GO-symbol, the second decoder input is the decoder output of the first timestep, the third decoder input is the decoder output of the second timestep, and so on)

user3480922
  • 564
  • 1
  • 10
  • 22
Phillip Bock
  • 1,879
  • 14
  • 23
  • Okay, Thanks. So, where do we feed the my_word2vec_matrix in the API? Is encoder_cell (in embedding_attention_seq2seq) is the embedding matrix which needs to be replaced with the tf.embedding_lookup ? – user3480922 Aug 01 '16 at 13:42
  • The embedding_rnn_seq2seq - function you use does this automatically. By the way, you need to correct the name in my snippet from embedding_attention_seq2seq to embedding_rnn_seq2seq – Phillip Bock Aug 01 '16 at 13:44
  • With "automatically" I mean: The embedding_rnn_seq2seq uses an embedding matrix. My assign_op assigns YOUR matrix to the embedding-matrix used in the model – Phillip Bock Aug 01 '16 at 13:45
  • Okay. I got it. Just a little confusion. As far as I see, `encoder_cell = rnn_cell.EmbeddingWrapper(cell, embedding_classes=num_encoder_symbols, embedding_size=embedding_size)` function computes the embedding matrix. So, is it so that the encoder_cell is overwritten by my_word2vec_matrix? If not please correct me. – user3480922 Aug 01 '16 at 13:52
  • 1
    It does. Or better: it creates a randomly filled embedding matrix and "connects" it to the inputs and the "uphill"-part of your model. My assign_op then overwrites the randomly created values with the embedding values you want to assign. – Phillip Bock Aug 01 '16 at 13:55
  • Great. Thanks. So we simply have to pass the lookup_matrix corresponding to respective words, right? – user3480922 Aug 01 '16 at 14:00
  • 1
    Yes, "pass the lookup_matrix corresponding to respective words" is correct. Make sure the IDs correspond in your vocabulary sets (there are 2 sets: Your embeddings from the word2vec and the ones that the seq2seq-model creates.) good luck – Phillip Bock Aug 01 '16 at 14:07