I am new to allennlp, I use sentencepiece for subword tokenization in my pipeline.
SentencePiece needs a training step to generate a subword model, which can then be used for tokenization.
Is an implementation of Vocabulary
class the right way to do it. Little confused whether it is the right place, given there are TokenIndexers for character tokenization etc.