2

I read a paper about machine translation, and it uses projection layer. Its encoder has 6 bidirectional LSTM layers. If input embedding dimension is 512, how much will be the dimension of the encoder output? 512*2**5?

The paper's link: https://www.aclweb.org/anthology/P18-1008.pdf

kintsuba
  • 139
  • 2
  • 7

1 Answers1

1

Not quite. Unfortunately, Figure 1 in the mentioned paper is a bit misleading. It is not that the six encoding layers are in parallel, as it might be understood from the figure, but rather that these layers are successive, meaning that the hidden state/output from the previous layer is used in the subsequent layer as an input.

This, and the fact that the input (embedding) dimension is NOT the output dimension of the LSTM layer (in fact, it is 2 * hidden_size) change your output dimension to exactly that: 2 * hidden_size, before it is put into the final projection layer, which again is changing the dimension depending on your specifications.

It is not quite clear to me what the description of add does in the layer, but if you look at a reference implementation it seems to be irrelevant to the answer. Specifically, observe how the encoding function is basically

def encode(...):
    encode_inputs = self.embed(...)
    for l in num_layers:
        prev_input = encode_inputs

        encode_inputs = self.nth_layer(...)
        # ...

Obviously, there is a bit more happening here, but this illustrates the basic functional block of the network.

dennlinger
  • 9,890
  • 1
  • 42
  • 63
  • Thanks! Reference implementation is really helpful for me. – kintsuba Feb 18 '20 at 02:50
  • If this is answering your question, please consider [accepting](https://stackoverflow.com/help/someone-answers) the answer to mark this question as "Done". – dennlinger Feb 18 '20 at 08:48