Remove tokens from Hugging Face tokenizer and save

Asked May 15 '23 at 08:38

Active May 15 '23 at 08:38

Viewed 213 times

In the current implementation, what is the recommended way for removing tokens from a any Hugging Face PreTrainedTokenizer? Simply creating a new vocabulary.txt and loading it with from_pretrained is deprecated and does not scale to all tokenizers. I know that there are specific methods for adding tokens but I have not found ones that allow for the deletion of any original token.

So I would like to be able to remove a given set of tokens from any tokenizer's vocabulary and then save this updated tokenizer with save_pretrained.

asked May 15 '23 at 08:38

Bram Vanroy

27,032
24
137
239

Remove tokens from Hugging Face tokenizer and save

0 Answers0