2

The documentation shown in the official documentation seems to be jumping straight into attention models without showing how to use the basic seq2seq model. I'm attempting to translate from a certain date format into one standard method. Some examples are shown below:

[('7 07 13', '2013-07-07'),
 ('30 JULY 1977', '1977-07-30'),
 ('Tuesday, September 14, 1971', '1971-09-14'),
 ('18 09 88', '1988-09-18'),
 ('31, Aug 1986', '1986-08-31')]

where the second column is the output 'y'.

Here is how I plan on using a seq2seq model at a high level:

  1. Embed both the output and input characters into 10 dimensional vectors.
  2. Pad the input vectors so that they are all a fixed length (29 in my case, and the output is of length 11).
  3. Pass those 29 characters into seq2seq so that they output 11 logits of size hidden_size
  4. Use the average crossentropy loss (across time steps and batch size) to get a loss value.
  5. Optimise this loss.

The model that I have so far is as follows:

import tensorflow.contrib.legacy_seq2seq as seq2seq

enc_inp = [tf.placeholder(tf.int32, shape=(None,)) for t in range(x_seq_length)]
labels = [tf.placeholder(tf.int32, shape=(None,)) for t in range(y_seq_length)]
is_train  = tf.placeholder(tf.bool)
weights = [tf.ones_like(labels_t, dtype=tf.float32) for labels_t in labels]

memory_dim = 32
embed_dim = 32
cell = tf.contrib.rnn.BasicLSTMCell(memory_dim)
dec_outputs, dec_memory = seq2seq.embedding_rnn_seq2seq(enc_inp, dec_inp, cell, 
                                                        len(char2numX), len(char2numY), 
                                                        embed_dim,
                                                        feed_previous = tf.logical_not(is_train))

loss = seq2seq.sequence_loss(dec_outputs, labels, weights)

Hoever, it is complaining that Lengths of logits, weights, and targets must be the same 29, 11, 11, because (I think) dec_outputs is of length 29, whereas I was hoping it would somehow be of length 11.

  1. My question is given that I am translating from a sequence of length 29 to a sequence of length 11, how am I supposed to do this in tensorflow.
  2. Also correct me if Im wrong, the input to these models is of size [time_steps, batch_size]? Notice that batch_size is the second argument, not first. Got this impression by a few tutorials I read.

Full code with data is available here.

sachinruk
  • 9,571
  • 12
  • 55
  • 86

0 Answers0