How to do sequence classification with pytorch nn.Transformer?

Question

I am doing a sequence classification task using nn.TransformerEncoder(). Whose pipeline is similar to nn.LSTM().

I have tried several temporal features fusion methods:

Selecting the final outputs as the representation of the whole sequence.
Using an affine transformation to fuse these features.
Classifying the sequence frame by frame, and then select the max values to be the category of the whole sequence.

But, all these 3 methods got a terrible accuracy, only 25% for 4 categories classification. While using nn.LSTM with the last hidden state, I can achieve 83% accuracy easily. I tried plenty of hyperparameters of nn.TransformerEncoder(), but without any improvement for the accuracy.

I have no idea about how to adjust this model now. Could you give me some practical advice? Thanks.

For LSTM: the forward() is:

    def forward(self, x_in, x_lengths, apply_softmax=False):

        # Embed
        x_in = self.embeddings(x_in)

        # Feed into RNN
        out, h_n = self.LSTM(x_in) #shape of out: T*N*D

        # Gather the last relevant hidden state
        out = out[-1,:,:] # N*D

        # FC layers
        z = self.dropout(out)
        z = self.fc1(z)
        z = self.dropout(z)
        y_pred = self.fc2(z)

        if apply_softmax:
            y_pred = F.softmax(y_pred, dim=1)
        return y_pred

For transformer:

    def forward(self, x_in, x_lengths, apply_softmax=False):

        # Embed
        x_in = self.embeddings(x_in)

        # Feed into RNN
        out = self.transformer(x_in)#shape of out T*N*D

        # Gather the last relevant hidden state
        out = out[-1,:,:] # N*D

        # FC layers
        z = self.dropout(out)
        z = self.fc1(z)
        z = self.dropout(z)
        y_pred = self.fc2(z)

        if apply_softmax:
            y_pred = F.softmax(y_pred, dim=1)
        return y_pred

Did you ever get this working? I'm stuck with a similar problem. — n8henrie, Mar 29 '20 at 16:07
@n8henrie and Whisht check this repo https://github.com/maqboolkhan/Transformer_classifier_pytorch I implemented a classifier using Transformer's encoder block using Pytorch. I was also stuck in the same problem then it turned out to be a problem with my loss function and padding. I was applying padding to the target as well which was unnecessary! — maq, Feb 18 '22 at 14:44
@maq Thanks for the bump -- I also got it working eventually: https://n8henrie.com/2021/08/writing-a-transformer-classifier-in-pytorch/ — n8henrie, Feb 18 '22 at 15:11

score 5 · Answer 1 · answered Sep 25 '19 at 17:47

The accuracy you mentioned indicates that something is wrong. Since you are comparing LSTM with TransformerEncoder, I want to point to some crucial differences.

Positional embeddings: This is very important since the Transformer does not have recurrence concept and so it doesn't capture sequence information. So, make sure you add positional information along with the input embeddings.
Model architecture: d_model, n_head, num_encoder_layers are important. Go with the default size as used in Vaswani et al., 2017. (d_model=512, n_head=8, num_encoder_layers=6)
Optimization: In many scenarios, it has been found that the Transformer needs to be trained with smaller learning rate, large batch size, WarmUpScheduling.

Last but not least, for a sanity check, just make sure the parameters of the model is updating. You can also check the training accuracy to make sure the accuracy keeps increasing as the training proceeds.

Although it is difficult to say what is exactly wrong in your code but I hope that the above points will help!

score 0 · Answer 2 · answered Nov 08 '22 at 10:43

0

I am not sure if Selecting the final outputs as the representation of the whole sequence. is correct for transformers. As these models do not work the same way as recurrent networks. Last time point does not represent a complete embedding of a sequence. So using just last time-point I think you're discarding lots of information.

answered Nov 08 '22 at 10:43

Mateusz

1
1

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Nov 13 '22 at 08:40

How to do sequence classification with pytorch nn.Transformer?

2 Answers2