14

We can create a model from AutoModel(TFAutoModel) function:

from transformers import AutoModel 
model = AutoModel.from_pretrained('distilbert-base-uncase')

In other hand, a model is created by AutoModelForSequenceClassification(TFAutoModelForSequenceClassification):

from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification('distilbert-base-uncase')

As I know, both models use distilbert-base-uncase library to create models. From name of methods, the second class( AutoModelForSequenceClassification ) is created for Sequence Classification.

But what are really differences in 2 classes? And how to use them correctly?

(I searched in huggingface but it is not clear)

Tan Phan
  • 337
  • 1
  • 4
  • 14
  • 1
    I got a difference : with *AutoModel*, we can use **last_hidden_state** to get the [CLS] token. *AutoModelSequenceClassification* **last_hidden_state** is not exist. – Tan Phan Nov 10 '21 at 09:38

1 Answers1

12

The difference between AutoModel and AutoModelForSequenceClassification model is that AutoModelForSequenceClassification has a classification head on top of the model outputs which can be easily trained with the base model

Barbara Gendron
  • 385
  • 1
  • 2
  • 16
subho
  • 491
  • 1
  • 4
  • 13
  • So can I use **AutoModel** for *Classification* purpose? – Tan Phan Dec 05 '21 at 09:19
  • No actually from the Hugging face course you can see that,For our example, we will need a model with a sequence classification head (to be able to classify the sentences as positive or negative). So, we won’t actually use the AutoModel class, but AutoModelForSequenceClassification:https://huggingface.co/course/chapter2/2?fw=pt – subho Dec 07 '21 at 14:33
  • 2
    Does that means ```AutoModel``` have frozen weights while ```AutoModelForSequenceClassification``` have trainable weights? Actually, I have a requirement where I only want the model to act as a extractor and not as a trainable model. – Lawhatre Dec 30 '21 at 17:06
  • @subho What is classification head? – etang Jan 26 '23 at 01:38
  • 1
    @subho: what is the classification head? Is it a linear layer with D_in=number_of_classes? Any online pouter will be very useful! – Betty Feb 11 '23 at 12:58
  • It would depend on the architecture, this is the head used for RoBERTa for example - https://github.com/huggingface/transformers/blob/6112b1c6442aaf7affd2b0676a1cd4eee30c45cf/src/transformers/models/roberta/modeling_roberta.py#L1424 – David Waterworth Jul 20 '23 at 02:11