0

In the current implementation, what is the recommended way for removing tokens from a any Hugging Face PreTrainedTokenizer? Simply creating a new vocabulary.txt and loading it with from_pretrained is deprecated and does not scale to all tokenizers. I know that there are specific methods for adding tokens but I have not found ones that allow for the deletion of any original token.

So I would like to be able to remove a given set of tokens from any tokenizer's vocabulary and then save this updated tokenizer with save_pretrained.

Bram Vanroy
  • 27,032
  • 24
  • 137
  • 239

0 Answers0