4

Hi fellow Tensorflowers,

I am trying to implement a sequence-to-sequence model using the new seq2seq module that is under development and release with TF1.0 and 1.1. There is a dynamic_decode function that returns logits in the form of a `rnn_output.

Then, I need to calculate loss using the output of the RNN.

When I run it naively, just by calling tf.contrib.seq2seq.loss.sequence_loss with (rnn_output, weights, logits) it crashes with:

InvalidArgumentError (see above for traceback): Incompatible shapes: [1856,1,1024] vs. [9600,1,1024]
         [[Node: optimize/gradients/loss/sequence_loss/sampled_softmax_loss/Mul_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](optimize/gradients/loss/sequence_loss/sampled_softmax_loss/Mul_grad/Shape/_3099, optimize/gradients/loss/sequence_loss/sampled_softmax_loss/Mul_grad/Shape_1/_3101)]]
         [[Node: optimize/gradients/Add/_824 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:3", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_2787_optimize/gradients/Add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:3"](^_cloopMainDynamicDecoderWithAttention/decoder/decoder/while/BasicDecoderStep/multi_rnn_cell/cell_1/multi_rnn_cell/cell_2/lstm_cell/zeros/_128)]]

Which is natural, since rnn_output is dynamically shaped.

I have two possible solutions:

  1. "pack" dynamic tensor into a tensor of size equal to the maximum allowed length. I don't know how to pack a dynamic tensor into a tensor of fixed size, but it probably has to do with new interfaces for dynamic shape: tf.while_loop and TensorArrays. It would be great to hear some advice on that
  2. Dynamically calculate sequence_loss. But my knowledge of inner tensorflow implementation is too limited to assess correctly whether it's something easy to do. Any suggestions here?

The general question

What is the right approach to calculate sampled/normal softmax cross-entropy loss from dynamically shaped rnn_output of dynamic_decode?

I have the following code:

decoder_outputs, decoder_state = seq2seq.dynamic_decode(my_decoder, output_time_major=False, parallel_iterations=512,
                   swap_memory = True)

self.logits = decoder_outputs.rnn_output
self.loss = loss.sequence_loss(self.logits, tf.transpose(tf.stack(targets), [1,0], name="targets_"),
                                                 tf.transpose(tf.stack(self.target_weights), [1,0], name="weights_"),
                                                 softmax_loss_function = softmax_loss_function)

ipdb> tf.version '1.1.0-rc0'

python: 2.7

M Z
  • 4,571
  • 2
  • 13
  • 27
mhnatiuk
  • 148
  • 2
  • 11

2 Answers2

2

It's a trouble with tf.contrib.seq2seq.loss.sequence_loss, for sure. If you use dynamic RNNs and don't unroll your BPTT manually, you may use much simplier loss function.

What I did, is basically:

loss = tf.reduce_sum(
    tf.nn.sparse_softmax_cross_entropy_with_logits(
        labels=self.answers,
        logits=presoftmax
    )
)/self.batch_sz

I know, it's not purely scientific. You'll need to shape it for your task. It's just a hint.

1

I guess you are using GreedyEmbeddingHelper? During training, you should use TF's "TrainingHelper". The output dimension should match your target dimension because at ever time step, the target is used as your input.

Miaosen Wang
  • 420
  • 3
  • 16