Hi fellow Tensorflowers,
I am trying to implement a sequence-to-sequence model using the new seq2seq
module that is under development and release with TF1.0
and 1.1
.
There is a dynamic_decode
function that returns logits in the form of a `rnn_output.
Then, I need to calculate loss using the output of the RNN.
When I run it naively, just by calling tf.contrib.seq2seq.loss.sequence_loss
with (rnn_output, weights, logits)
it crashes with:
InvalidArgumentError (see above for traceback): Incompatible shapes: [1856,1,1024] vs. [9600,1,1024]
[[Node: optimize/gradients/loss/sequence_loss/sampled_softmax_loss/Mul_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](optimize/gradients/loss/sequence_loss/sampled_softmax_loss/Mul_grad/Shape/_3099, optimize/gradients/loss/sequence_loss/sampled_softmax_loss/Mul_grad/Shape_1/_3101)]]
[[Node: optimize/gradients/Add/_824 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:3", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_2787_optimize/gradients/Add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:3"](^_cloopMainDynamicDecoderWithAttention/decoder/decoder/while/BasicDecoderStep/multi_rnn_cell/cell_1/multi_rnn_cell/cell_2/lstm_cell/zeros/_128)]]
Which is natural, since rnn_output
is dynamically shaped.
I have two possible solutions:
- "pack" dynamic tensor into a tensor of size equal to the maximum allowed length. I don't know how to pack a dynamic tensor into a tensor of fixed size, but it probably has to do with new interfaces for dynamic shape:
tf.while_loop
andTensorArrays
. It would be great to hear some advice on that - Dynamically calculate sequence_loss. But my knowledge of inner tensorflow implementation is too limited to assess correctly whether it's something easy to do. Any suggestions here?
The general question
What is the right approach to calculate sampled/normal softmax cross-entropy loss from dynamically shaped rnn_output
of dynamic_decode
?
I have the following code:
decoder_outputs, decoder_state = seq2seq.dynamic_decode(my_decoder, output_time_major=False, parallel_iterations=512,
swap_memory = True)
self.logits = decoder_outputs.rnn_output
self.loss = loss.sequence_loss(self.logits, tf.transpose(tf.stack(targets), [1,0], name="targets_"),
tf.transpose(tf.stack(self.target_weights), [1,0], name="weights_"),
softmax_loss_function = softmax_loss_function)
ipdb> tf.version '1.1.0-rc0'
python: 2.7