Background info: I'm working on sequence-to-sequence models, and right now my model accepts variable-length input tensors (not lists) with input shapes corresponding to [batch size, sequence length]. However, in my implementation, sequence length is unspecified (set to None) to allow for variable length inputs. Specifically, input sequence batches are padded only to the length of the longest sequence in that batch. This has sped up my training time considerably, so I'd prefer to keep it this way, as opposed to going back to bucketed models and/or padded all sequences in the training data to the same length. I'm using TensorFlow 1.0.0.
Problem: I'm currently using the following to compute the loss (which runs just fine).
loss = tf.losses.sparse_softmax_cross_entropy(
weights=target_labels, # shape: [batch size, None]
logits=outputs[:, :-1, :], # shape: [batch size, None, vocab size]
weights=target_weights[:, :-1]) # shape: [batch size, None]
where vocab size is typically about 40,000. I'd like to use a sampled softmax, but I've ran into an issue that's due to the unspecified nature of the input shape. According to the documentation for tf.nn.sampled_softmax_loss, it requires the inputs to be fed separately for each timestep. However, I can't call, for example,
tf.unstack(target_labels, axis=1)
since the axis is unknown beforehand.Does anyone know how I might go about implementing this? One would assume that since both dynamic_rnn and tf.losses.sparse_softmax_cross_entropy seem to have no issue doing this, that a workaround could be implemented with the sampled softmax loss somehow. After digging around in the source code and even models repository, I've come up empty handed. Any help/suggestions would be greatly appreciated.