Define synonyms in the tokeniser for Huggingface models

Question

I was hoping I could define synonyms for huggingface models. So for a minimal example if we have some prompts where respondents says a sentence about what fastfood they like:

from transformers import pipeline
pipe = pipeline(model="facebook/bart-large-mnli", device = 0)


prompts = ["I like mcdonalds",
           "I hate maccas",
           "I love burger king",
           "The burgers are better at hungry jacks",
           "My favorite restarant is wendys"
           ]

result = pipe(prompts,
    candidate_labels = ['likes mcdonalds', 'likes burger king',
                        'does not like mcdonalds', 'does not like burger king'],
    hypothesis_template = "The writer {}.",
    multi_label = True
)

Then I want to define synonyms like the below (note that in Australia mcdonalds is called "maccas" and burger king is called "hungry jacks"):

synonyms = {'mcdonalds':'maccas',
            'burger king': 'hungry jacks'
            }

Is there any way to do this without having to retrain the model? I was hoping it might be possible to make the switch in the tokeniser without needing to retrain anything.

Define synonyms in the tokeniser for Huggingface models

0 Answers0