I have the text with custom tokens, like: <adjective>
and I am trying to prepare a byte level tokenizer that won't split them:
tokenizer.pre_tokenizer = ByteLevel()
tokenizer.pre_tokenizer.pre_tokenize("<adjective>")
[('Ġ<', (0, 2)), ('adjective', (2, 11)), ('>', (11, 12)]
How to add <adjective>
as not a special token, but a token that the tokenizer should not split?