1
  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes. Based on the NMT tutorial, I am writing a customized code for my own task.
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Linux Ubuntu 14.04 LTS
  • TensorFlow installed from (source or binary):Source
  • TensorFlow version (use command below):1.5
  • Python version: 3.6.3
  • Bazel version (if compiling from source):0.9.0
  • GCC/Compiler version (if compiling from source):5.4.1
  • CUDA/cuDNN version:CUDA 8.0, cuDNN 6
  • GPU model and memory: 1080 Ti
  • Exact command to reproduce:To be explained through this post.

I am writing a Seq2Seq code based on NMT tutorial code.(https://github.com/tensorflow/nmt)

I have modified the output projector for the decoder to be Fully-Connected layers rather than just a linear projector in the tutorial code. By defining the following custom Layer class:

customlayer.py

https://github.com/kami93/ntptest/blob/master/customlayer.py

and then initialized the custom layer like this:

with tf.variable_scope("decoder/output_layer"):
  output_layer = customlayer.FCLayers(
    [256]*2 + [757],
    activation = tf.nn.relu,
    is_decoder_output=True,
    residual=True,
    kernel_initializer=initializer,
    trainable=True)

then put the layer as the output_layer of BeamSearchDecoder like this

my_decoder = tf.contrib.seq2seq.BeamSearchDecoder(
          cell=cell,
          embedding=embedding_decoder,
          start_tokens=start_tokens,
          end_token=end_token,
          initial_state=decoder_initial_state,
          beam_width=beam_width,
          output_layer=output_layer)

and finally get the output sample_id like this

outputs, final_context_state, _ = tf.contrib.seq2seq.dynamic_decode(
        my_decoder,
        maximum_iterations=maximum_iterations,
        output_time_major=time_major,
        swap_memory=True,
        scope=decoder_scope)
sample_id = outputs.predicted_ids

The problem arises here.

Because the last output dimension of my custom layer is "757", I expect that the sample_id should be the indices for the argmax id of the custom layer output which should be in between [0,756].

However, the actual sample_id returned is in between [1,757], (i.e., "my expected sample_id + 1" is returned).

Inspecting into the actual code of the tf.contrib.seq2seq.BeamSearchDecoder at https://github.com/tensorflow/tensorflow/blob/r1.5/tensorflow/contrib/seq2seq/python/ops/beam_search_decoder.py ... There is the implementation of "_beam_search_step" on the lines between "line 510" and "line 652",

At line 545, vacab_size is gathered as 757.

vocab_size = logits.shape[-1].value or array_ops.shape(logits)[-1]

At line 577, the indices with top K(Beam width) softmax probability are determined among all nested "K*757" hypotheses.

next_beam_scores, word_indices = nn_ops.top_k(scores_flat, k=next_beam_size)

At line 595, the actual indices are calculated by the modulo operation.

 raw_next_word_ids = math_ops.mod(word_indices, vocab_size,
                               name="next_beam_word_ids")
 next_word_ids = math_ops.to_int32(raw_next_word_ids)

As a result, I see no point at all that indices between [1,757] should be returned as sample_id. At least because of the modulo operation by 757, which strictly returns value between [0,756], sample_id of 757 should never be returned in my opinion. But I am actually getting it.

Can someone please suggest why I am getting sample id of [1,757], instead of [0,756]?

  • Is it possible that your vocab_size does not include the `sos_id`and `eos_id`. In that case, the vocab_size actually becomes 757 + 2 = 759, and the `sos_id=0` and would not appear in the sample id, which explains the sample_id starting from 1? – lifang May 25 '18 at 14:59

0 Answers0