I am building a chat-bot with a sequence to sequence encoder decoder model as in NMT. From the data given I can understand that when training they feed the decoder outputs into the decoder inputs along with the encoder cell states. I cannot figure out that when i am actually deploying a chatbot in real time, how what should I input into the decoder since that time is the output that i have to predict. Can someone help me out with this please?

- 52,561
- 27
- 155
- 209

- 779
- 1
- 7
- 13
-
I am also follow that https://github.com/tensorflow/nmt and I have same problem like that can you find solusion?? – Jignasha Royala Jan 15 '18 at 11:53
1 Answers
The exact answer depends on which building blocks you take from Neural Machine Translation model (NMT) and which ones you would replace with your own. I assume the graph structure exactly as in NMT.
If so, at inference time, you can feed just a vector of zeros to the decoder.
Internal details: NMT uses the entity called Helper
to determine the next input in the decoder (see tf.contrib.seq2seq.Helper
documentation).
In particular, tf.contrib.seq2seq.BasicDecoder
relies solely on helper when it performs a step: the next_inputs
that the are fed in to the subsequent cell is exactly the return value of Helper.next_inputs()
.
There are different implementations of Helper
interface, e.g.,
tf.contrib.seq2seq.TrainingHelper
is returning the next decoder input (which is usually ground truth). This helper is used in training as indicated in the tutorial.tf.contrib.seq2seq.GreedyEmbeddingHelper
discards the inputs, and returns theargmax
sampled token from the previous output. NMT uses this helper in inference whensampling_temperature
hyper-parameter is 0.tf.contrib.seq2seq.SampleEmbeddingHelper
does the same, but samples the token according to categorical (a.k.a. generalized Bernoulli) distribution. NMT uses this helper in inference whensampling_temperature > 0
.- ...
The code is in BaseModel._build_decoder
method.
Note that both GreedyEmbeddingHelper
and SampleEmbeddingHelper
don't care what the decoder input is. So in fact you can feed anything, but the zero tensor is the standard choice.

- 52,561
- 27
- 155
- 209
-
If we feed in a vector of zeros as the 'start token' at inference time, do we need to prepend zeros to the target words during the training process for consistency? – Eweler Mar 30 '18 at 04:55
-
In inference, in contrast with training, the input vector is not used. In training it is used, so it must be sensible – Maxim Mar 30 '18 at 06:22
-
So e.g., if a single example of my tokenized training inputs are [1,2,3,4], should I add the start token 0 to the training inputs to make them [0,1,2,3,4] for correct behaviour? Given that we prepend zeros to the start during inference? – Eweler Mar 30 '18 at 07:14