0

I have seen many people working on Neural Machine Translation. Usually, they represent their sentence between <BOS><EOS>, <START><END>, etc. tags, before training the network. Of course it's a logical solution to specify the start and end of sentences, but I wonder how the neural networks understand that the string <END> (or others) means end of a sentence?

Burhan Bilen
  • 83
  • 2
  • 7

1 Answers1

2

It doesn't.

At inference time, there's a hardcoded rule that if that token is generated, the sequence is done, and the underlying neural model will no longer be asked for the next token.

source_seq = tokenize('This is not a test.')
print(source_seq)

At this point you'd get something like:

[ '<BOS>', 'Thi###', ... , '###t', '.' , '<EOS>' ]

Now we build the target sequence with the same format:

target_seq = [ '<BOS>' ]

while true:
    token = model.generate_next_token(source_seq, target_seq)
    if token == '<EOS>':
       break
    seq.append(token)

The model itself only predicts the most likely next token give the current state (the input sequence and the output sequence so far).

It can't exit the loop any more than it can pull your machine's plug out of the wall.

Note that that's not the only hardcoded ruled here. The other one is the decision to start from the first token and only ever append - never prepend, never delete... - like a human speaking.

Adam Bittlingmayer
  • 1,169
  • 9
  • 22
  • I will approve your answer but can you explain a bit more, if possible? For example, what if I don't obey this rule and don't put any tokens in my sentences? – Burhan Bilen Feb 21 '21 at 13:00
  • 1
    @BurhanBilen Then the model will have no chance to learn to predict that token, and you'll get an effectively infinite loop. – Adam Bittlingmayer Feb 22 '21 at 07:43
  • 1
    Then it’ll generate infinitely long sentences—unless you use some other rule to cut it off, like “stop if length is greater than 200” – Arya McCarthy Feb 22 '21 at 07:44
  • 1
    Autoregressive sequence generation (you’d know if you weren’t doing this) works by producing one word at a time, until you produce the special ‘stop-sign’ word EOS. – Arya McCarthy Feb 22 '21 at 07:44
  • 1
    Thank you very much, very nice and clear explanations from both of you, now it's more understandable and clear. – Burhan Bilen Feb 22 '21 at 08:40
  • 1
    Burhan, I added a note which may interest you as a speaker of both a left-branching language and a right-branching language. – Adam Bittlingmayer Feb 23 '21 at 06:45
  • 1
    @AdamBittlingmayer Thanks a lot, Adam. I appreciate your help, that will also be much helpful to other people. – Burhan Bilen Mar 02 '21 at 07:23