I am trying to set up separate endpoints for tokenization and inference using HuggingFace models. Ideally I would like to use HuggingFace inference endpoints.
Is there a straightforward way to spin up endpoints for encoding, decoding, and inference for the same HF model? Or would I need to create containers for the encoder/decoder myself? I know HF has inference endpoints, but I'm not sure how well supported the tokenizer use case is or how I would implement that (e.g. what does the post request look like for encoding vs decoding, can I run it on the same infra as the inference endpoint, etc).
I have tried HF inference endpoints for inference, and I see that there are tokenizers available, but I am not sure how I can implement encoder/decoder for the tokenizer using the inference endpoint, and I'm unsure how to optimize.