The task we are taking about is called Zero-Shot Topic Classification - predicting a topic that the model has not been trained on. This paradigm is supported by Hugging Face library, you can read more here. The most common pre-trained model is Bart Large MNLI - the checkpoint for bart-large after being trained on the MNLI dataset.
Here is a simple example, showing the classification of phrase "I like hot dogs" without any preliminary training:
First of all, please install the transformers library:
pip install --upgrade transformers
Then import and initialize the pipeline:
from transformers import pipeline
classifier = pipeline('zero-shot-classification', model='facebook/bart-large-mnli')
Enter our toy dataset:
labels = ["artifacts", "animals", "food", "birds"]
hypothesis_template = 'This text is about {}.'
sequence = "I like hot dogs"
Predict the label:
prediction = classifier(sequence, labels, hypothesis_template=hypothesis_template, multi_class=True)
print(prediction)
The output will be somethng like
`{'sequence': 'i like hot dogs',
'labels': ['food', 'animals', 'artifacts', 'birds'],
'scores': [0.9971900582313538, 0.00529429130256176, 0.0020991512574255466,
0.00023589911870658398]}`
It can be interpreted, that the model assigns the highest probability (0.997..) to the label 'food', which is the correct answer.