- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes. Based on the NMT tutorial, I am writing a customized code for my own task.
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Linux Ubuntu 14.04 LTS
- TensorFlow installed from (source or binary):Source
- TensorFlow version (use command below):1.5
- Python version: 3.6.3
- Bazel version (if compiling from source):0.9.0
- GCC/Compiler version (if compiling from source):5.4.1
- CUDA/cuDNN version:CUDA 8.0, cuDNN 6
- GPU model and memory: 1080 Ti
- Exact command to reproduce:To be explained through this post.
I am writing a Seq2Seq code based on NMT tutorial code.(https://github.com/tensorflow/nmt)
I have modified the output projector for the decoder to be Fully-Connected layers rather than just a linear projector in the tutorial code. By defining the following custom Layer class:
customlayer.pyhttps://github.com/kami93/ntptest/blob/master/customlayer.py
and then initialized the custom layer like this:
with tf.variable_scope("decoder/output_layer"):
output_layer = customlayer.FCLayers(
[256]*2 + [757],
activation = tf.nn.relu,
is_decoder_output=True,
residual=True,
kernel_initializer=initializer,
trainable=True)
then put the layer as the output_layer of BeamSearchDecoder like this
my_decoder = tf.contrib.seq2seq.BeamSearchDecoder(
cell=cell,
embedding=embedding_decoder,
start_tokens=start_tokens,
end_token=end_token,
initial_state=decoder_initial_state,
beam_width=beam_width,
output_layer=output_layer)
and finally get the output sample_id like this
outputs, final_context_state, _ = tf.contrib.seq2seq.dynamic_decode(
my_decoder,
maximum_iterations=maximum_iterations,
output_time_major=time_major,
swap_memory=True,
scope=decoder_scope)
sample_id = outputs.predicted_ids
The problem arises here.
Because the last output dimension of my custom layer is "757", I expect that the sample_id should be the indices for the argmax id of the custom layer output which should be in between [0,756].
However, the actual sample_id returned is in between [1,757], (i.e., "my expected sample_id + 1" is returned).
Inspecting into the actual code of the tf.contrib.seq2seq.BeamSearchDecoder at https://github.com/tensorflow/tensorflow/blob/r1.5/tensorflow/contrib/seq2seq/python/ops/beam_search_decoder.py ... There is the implementation of "_beam_search_step" on the lines between "line 510" and "line 652",
At line 545, vacab_size is gathered as 757.
vocab_size = logits.shape[-1].value or array_ops.shape(logits)[-1]
At line 577, the indices with top K(Beam width) softmax probability are determined among all nested "K*757" hypotheses.
next_beam_scores, word_indices = nn_ops.top_k(scores_flat, k=next_beam_size)
At line 595, the actual indices are calculated by the modulo operation.
raw_next_word_ids = math_ops.mod(word_indices, vocab_size,
name="next_beam_word_ids")
next_word_ids = math_ops.to_int32(raw_next_word_ids)
As a result, I see no point at all that indices between [1,757] should be returned as sample_id. At least because of the modulo operation by 757, which strictly returns value between [0,756], sample_id of 757 should never be returned in my opinion. But I am actually getting it.
Can someone please suggest why I am getting sample id of [1,757], instead of [0,756]?