Given a dictionary char_to_idx
how can one create a tokenizer such that the ids of the tokens are guaranteed to be the same as in char_to_idx?
char_to_idx = {'a': 0, 'b': 1, 'c': 2, 'd': 3}
tokenizer = tokenizers.Tokenizer(tokenizers.models.Unigram())
# ???
print(tokenizer.get_vocab())
# {'a': 0, 'b': 1, 'c': 2, 'd': 3}