In the current implementation, what is the recommended way for removing tokens from a any Hugging Face PreTrainedTokenizer
? Simply creating a new vocabulary.txt and loading it with from_pretrained
is deprecated and does not scale to all tokenizers. I know that there are specific methods for adding tokens but I have not found ones that allow for the deletion of any original token.
So I would like to be able to remove a given set of tokens from any tokenizer's vocabulary and then save this updated tokenizer with save_pretrained
.