How to run custom seq2seq learning (using pre-calculated word embeddings) encoder-decoder in Tensorflow?

Question

I need to run a encoder-decoder model in Tensorflow. I see that using the available APIs basic_rnn_seq2seq(encoder_input_data, decoder_input_data, lstm_cell) etc, a encoder-decoder system can be created.

How can we enter the embeddings such as word2vec in such model? I am aware that we can do embedding lookup but as per the API encoder_input_data is a list of 2D Tensor of size batch_size x input_size. How can each word be represented using its respective word embedding in this setup? Even embedding_rnn_seq2seq internally extracts the embeddings. How to give pre-calculated word embeddings as input?
How can we get the cost/perplexity through the API?
In case of test instances, we may not know the corresponding decoder inputs. How to handle such case?

score 3 · Answer 1 · edited Aug 01 '16 at 14:14

3

First question: Probably not the best way, but what I did was, after building the model, before training starts:

for v in tf.trainable_variables():
  if v.name == 'embedding_rnn_seq2seq/RNN/EmbeddingWrapper/embedding:0':
    assign_op = v.assign(my_word2vec_matrix)
    session.run(assign_op)  # or `assign_op.op.run()`

my_word2vec_matrix is a matrix of shape vocabularysize x embedding size and filled in my precomputed embedding-vectors. Use this (or something similar) if you believe your embeddings are really good. Otherwise the seq2seq-Model, over time, will come up with its own trained embedding.

Second question: In seq2seq.py there is a call to model_with_buckets() which you can find in python/ops/seq2seq.py. From there the loss is returned.

Third question: In the test case each decoder input is the decoder output from the timestep before (i.e. the first decoder input is a special GO-symbol, the second decoder input is the decoder output of the first timestep, the third decoder input is the decoder output of the second timestep, and so on)

edited Aug 01 '16 at 14:14

user3480922

564
1
10
22

answered Aug 01 '16 at 13:31

Phillip Bock

1,879
14
23

Okay, Thanks. So, where do we feed the my_word2vec_matrix in the API? Is encoder_cell (in embedding_attention_seq2seq) is the embedding matrix which needs to be replaced with the tf.embedding_lookup ? – user3480922 Aug 01 '16 at 13:42
The embedding_rnn_seq2seq - function you use does this automatically. By the way, you need to correct the name in my snippet from embedding_attention_seq2seq to embedding_rnn_seq2seq – Phillip Bock Aug 01 '16 at 13:44
With "automatically" I mean: The embedding_rnn_seq2seq uses an embedding matrix. My assign_op assigns YOUR matrix to the embedding-matrix used in the model – Phillip Bock Aug 01 '16 at 13:45
Okay. I got it. Just a little confusion. As far as I see, `encoder_cell = rnn_cell.EmbeddingWrapper(cell, embedding_classes=num_encoder_symbols, embedding_size=embedding_size)` function computes the embedding matrix. So, is it so that the encoder_cell is overwritten by my_word2vec_matrix? If not please correct me. – user3480922 Aug 01 '16 at 13:52
1

It does. Or better: it creates a randomly filled embedding matrix and "connects" it to the inputs and the "uphill"-part of your model. My assign_op then overwrites the randomly created values with the embedding values you want to assign. – Phillip Bock Aug 01 '16 at 13:55
Great. Thanks. So we simply have to pass the lookup_matrix corresponding to respective words, right? – user3480922 Aug 01 '16 at 14:00
1

Yes, "pass the lookup_matrix corresponding to respective words" is correct. Make sure the IDs correspond in your vocabulary sets (there are 2 sets: Your embeddings from the word2vec and the ones that the seq2seq-model creates.) good luck – Phillip Bock Aug 01 '16 at 14:07

How to run custom seq2seq learning (using pre-calculated word embeddings) encoder-decoder in Tensorflow?

1 Answers1