my question is related to camemBERT model (french version of BERT) and its Tokenizer :
Why every word of the vocabulary has a "▁" character before ? For example, it's not "sirop" but "▁sirop" (sirop => syrup).
from transformers import CamembertTokenizer
tokenizer = Camembert.Tokenizer.from_pretrained("camembert-base")
voc = tokenizer.get_vocab() #Vocabulary of the model
print("sirop" in voc) # Will display False
print("▁sirop" in voc) # Will display True
Thank you for answering :)