The documentation shown in the official documentation seems to be jumping straight into attention models without showing how to use the basic seq2seq model. I'm attempting to translate from a certain date format into one standard method. Some examples are shown below:
[('7 07 13', '2013-07-07'),
('30 JULY 1977', '1977-07-30'),
('Tuesday, September 14, 1971', '1971-09-14'),
('18 09 88', '1988-09-18'),
('31, Aug 1986', '1986-08-31')]
where the second column is the output 'y'.
Here is how I plan on using a seq2seq model at a high level:
- Embed both the output and input characters into 10 dimensional vectors.
- Pad the input vectors so that they are all a fixed length (29 in my case, and the output is of length 11).
- Pass those 29 characters into seq2seq so that they output 11 logits of size
hidden_size
- Use the average crossentropy loss (across time steps and batch size) to get a loss value.
- Optimise this loss.
The model that I have so far is as follows:
import tensorflow.contrib.legacy_seq2seq as seq2seq
enc_inp = [tf.placeholder(tf.int32, shape=(None,)) for t in range(x_seq_length)]
labels = [tf.placeholder(tf.int32, shape=(None,)) for t in range(y_seq_length)]
is_train = tf.placeholder(tf.bool)
weights = [tf.ones_like(labels_t, dtype=tf.float32) for labels_t in labels]
memory_dim = 32
embed_dim = 32
cell = tf.contrib.rnn.BasicLSTMCell(memory_dim)
dec_outputs, dec_memory = seq2seq.embedding_rnn_seq2seq(enc_inp, dec_inp, cell,
len(char2numX), len(char2numY),
embed_dim,
feed_previous = tf.logical_not(is_train))
loss = seq2seq.sequence_loss(dec_outputs, labels, weights)
Hoever, it is complaining that Lengths of logits, weights, and targets must be the same 29, 11, 11
, because (I think) dec_outputs
is of length 29, whereas I was hoping it would somehow be of length 11.
- My question is given that I am translating from a sequence of length 29 to a sequence of length 11, how am I supposed to do this in tensorflow.
- Also correct me if Im wrong, the input to these models is of size
[time_steps, batch_size]
? Notice thatbatch_size
is the second argument, not first. Got this impression by a few tutorials I read.
Full code with data is available here.