I'm trying to add a few new words to the vocabulary of a pretrained HuggingFace Transformers model. I did the following to change the vocabulary of the tokenizer and also increase the embedding size of the model:
tokenizer.add_tokens(['word1', 'word2', 'word3', 'word4'])
model.resize_token_embeddings(len(tokenizer))
print(len(tokenizer)) # outputs len_vocabulary + 4
But after training the model on my corpus and saving it, I found out that the saved tokenizer vocabulary size hasn't changed. After checking again I found out that the abovementioned code does not change the vocabulary size (tokenizer.vocab_size is still the same) and only the len(tokenizer) has changed.
So now my question is; what is the difference between tokenizer.vocab_size and len(tokenizer)?