4

I'm doing sentiment analysis of Spanish tweets.

After reviewing some of the recent literature, I've seen that there's been a most recent effort to train a RoBERTa model exclusively on Spanish text (roberta-base-bne). It seems to perform better than the current state-of-the-art model for Spanish language modeling so far, BETO.

The RoBERTa model has been trained for a variety of tasks, which do not include text classification. I want to take this RoBERTa model and fine-tune it for text classification, more specifically, sentiment analysis.

I've done all the preprocessing and created the dataset objects, and want to natively train the model.

Code

# Training with native TensorFlow 

from transformers import TFRobertaForSequenceClassification

model = TFRobertaForSequenceClassification.from_pretrained("BSC-TeMU/roberta-base-bne")

optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=model.compute_loss) # can also use any keras loss fn
model.fit(train_dataset.shuffle(1000).batch(16), epochs=3, batch_size=16)

Question
My questions is regarding the TFRobertaForSequenceClassification:
Is it correct to use this, since it's not specified in the model card? Instead of the AutoModelForMaskedLM specified in the model card.

Do we, by simply applying TFRobertaForSequenceClassification, imply that it will automatically apply the trained (and pretrained) knowledge to the new task, namely text classification?

LeLuc
  • 405
  • 3
  • 12

1 Answers1

4

The model in the model card references what essentially the model has been trained on. If you are familiar with architectural choices for different modeling tasks (e.g., token classification vs sequence classification), it should become clear that these models will have slightly different layouts, specifically in the layers after the Transformer output layer. For token classification, this is (generally speaking) Dropout and an additional linear layer, mapping from the hidden_size of the model to the number of output classes. See here for an example with BERT.

This means that the model checkpoint which was pre-trained with a different learning objective will not have weights for this final layer, but instead you train these (comparably few) parameters during your fine-tuning. In fact, for PyTorch models you will generally get a warning when loading a model checkpoint that slightly differs in the available weights:

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: [...]

  • This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). [...]

This is exactly what you are doing, so as long as you have a decent number of fine-tuning examples (depending on the number of classes, I would suggest 10e3-10e4 as a rule of thumb), this will not affect your training by much.

I want to point out, however, that it might be necessary for you to specify the number of labels that your TokenClassification layer has. You can do this, by specifying it during the loading of your model:

from transformers import TFRobertaForSequenceClassification
roberta = TFRobertaForSequenceClassification.from_pretrained("BSC-TeMU/roberta-base-bne", 
                                                             num_labels=<your_value>)
dennlinger
  • 9,890
  • 1
  • 42
  • 63
  • Thanks a lot for the detailed explanation. I had to do some changes to the code (see above). I added the `num_labels` and `from_pt` arguments to the model variable. However, now, when I try to train the model, I get the following error: `UnimplementedError: Cast string to float is not supported [[node Cast (defined at :9) ]] [Op:__inference_train_function_45018] Function call stack: train_function` – LeLuc Sep 28 '21 at 18:51
  • 1
    Sorry, I think I should have clarified that this "recipe" is specifically only related to loading the TF model. I think altering the question afterwards is also not a good idea, since it doesn't reflect the actual problem state at the time of writing my answer. Instead, I would suggest that you open a new question with the updated code, and, if possible, revert your question here to the state it was before. – dennlinger Sep 28 '21 at 20:03
  • Okay, understood. I just did that. I also managed to solve the above error by converting my labels to ints. The model is currently training. – LeLuc Sep 28 '21 at 20:31
  • Also, just to clarify, the problem above wasn't related to loading a TF model at all. Instead, that is something I realized afterward and added to the code so that the question would rather be understood more properly and not be confused with the TF issue. Thanks a lot anyways for the hint. – LeLuc Sep 28 '21 at 20:39