5

I've trained/fine-tuned a Spanish RoBERTa model that has recently been pre-trained for a variety of NLP tasks except for text classification.

Since the baseline model seems to be promising, I want to fine-tune it for a different task: text classification, more precisely, sentiment analysis of Spanish Tweets and use it to predict labels on scraped tweets I have.

The preprocessing and the training seem to work correctly. However, I don't know how I can use this mode afterwards for prediction.

I'll leave out the preprocessing part because I don't think there seems to be an issue.

Code:

# Training with native TensorFlow 
from transformers import TFAutoModelForSequenceClassification

## Model Definition
model = TFAutoModelForSequenceClassification.from_pretrained("BSC-TeMU/roberta-base-bne", from_pt=True, num_labels=3)

## Model Compilation
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.metrics.SparseCategoricalAccuracy()
model.compile(optimizer=optimizer, 
              loss=loss,
              metrics=metric) 

## Fitting the data 
history = model.fit(train_dataset.shuffle(1000).batch(64), epochs=3, batch_size=64)

Output:

/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py:337: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
  "Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 "
Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFRobertaForSequenceClassification: ['roberta.embeddings.position_ids']
- This IS expected if you are initializing TFRobertaForSequenceClassification from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFRobertaForSequenceClassification from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
Some weights or buffers of the TF 2.0 model TFRobertaForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/5
16/16 [==============================] - 35s 1s/step - loss: 1.0455 - sparse_categorical_accuracy: 0.4452
Epoch 2/5
16/16 [==============================] - 18s 1s/step - loss: 0.6923 - sparse_categorical_accuracy: 0.7206
Epoch 3/5
16/16 [==============================] - 18s 1s/step - loss: 0.3533 - sparse_categorical_accuracy: 0.8885
Epoch 4/5
16/16 [==============================] - 18s 1s/step - loss: 0.1871 - sparse_categorical_accuracy: 0.9477
Epoch 5/5
16/16 [==============================] - 18s 1s/step - loss: 0.1031 - sparse_categorical_accuracy: 0.9714

Question:

How can I use the model after fine-tuning for text classification/sentiment analysis? (I want to create a predicted label for each tweet I scraped.)
What would be a good way of approaching this?

I've tried to save the model, but I don't know where I can find it and use then:

# Save the model
model.save_pretrained('Twitter_Roberta_Model')

I've also tried to just add it to a HuggingFace pipeline like the following. But I'm not sure if this works correctly.

classifier = pipeline('sentiment-analysis', 
model=model, 
tokenizer=AutoTokenizer.from_pretrained("BSC-TeMU/roberta-base-bne"))
LeLuc
  • 405
  • 3
  • 12

1 Answers1

3

Although this is an example for a specific model (DistilBert), the following prediction code should work similarly (small modifications according to your needs). You just need to replace the distillbert according to your model (TFAutoModelForSequenceClassification) and of course ensure the proper tokenizer is used.

    loaded_model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
    loaded_model.load_weights('./distillbert_tf.h5')
    input_text = "The text on which I test"
    input_text_tokenized = tokenizer.encode(input_text,
                                            truncation=True,
                                            padding=True,
                                            return_tensors="tf")
    prediction = loaded_model(input_text_tokenized)
    prediction_logits = prediction[0]
    prediction_probs = tf.nn.softmax(prediction_logits,axis=1).numpy()
    print(f'The prediction probs are: {prediction_probs}')
Timbus Calin
  • 13,809
  • 5
  • 41
  • 59
  • Is there a way to load my model after doing `model.save_pretrained('Twitter_Roberta_Model')`, or can I just continue with, for example: `prediction = model(input_text_tokenized)`, since I've already created the `model` variable earlier in my code, during training. And what about the second approach I mention with the HuggingFace pipeline. Would that be a valid way to do it, or is there something I'm missing? – LeLuc Sep 29 '21 at 10:19
  • 1
    I never used a huggingface pipeline, so in that regard I cannot help you/at this moment do not know a valid response. And yes, if you trained the model and do not want to load it, you can do that of course with your model, just ensure you use the right tokenizer. – Timbus Calin Sep 29 '21 at 10:20
  • 1
    I've just tried you code and it seems to work well. Thank you! Would you happen to know how I can also show the labels? E.g. in the form of a dictionary. In my case the labels just happen to be integers, but anyway, it'd be great to be able to show them. – LeLuc Sep 29 '21 at 10:29
  • 1
    If you know that label 0 is positive and 1 is negative, then you could use tf.argmax/np.argmax() on the prediction_probs, and if the result of argmax() is 0 you print 'negative' and if the result is 1 you can print 'positive' – Timbus Calin Sep 29 '21 at 10:31
  • 1
    Once you obtain the result of argmax() -> label 0 or 1 as a prediction, you can create a dictionary/other structure as you wish. – Timbus Calin Sep 29 '21 at 10:32
  • 1
    Thanks again. FYI, I get the same results with both methods, i.e. the Huggingface pipeline and your code. – LeLuc Sep 29 '21 at 10:33
  • 1
    That is great, it means both solutions are correct. – Timbus Calin Sep 29 '21 at 10:33