How to use HuggingFace Inference endpoints for both tokenization and inference?

Question

I am trying to set up separate endpoints for tokenization and inference using HuggingFace models. Ideally I would like to use HuggingFace inference endpoints.

Is there a straightforward way to spin up endpoints for encoding, decoding, and inference for the same HF model? Or would I need to create containers for the encoder/decoder myself? I know HF has inference endpoints, but I'm not sure how well supported the tokenizer use case is or how I would implement that (e.g. what does the post request look like for encoding vs decoding, can I run it on the same infra as the inference endpoint, etc).

I have tried HF inference endpoints for inference, and I see that there are tokenizers available, but I am not sure how I can implement encoder/decoder for the tokenizer using the inference endpoint, and I'm unsure how to optimize.

score 1 · Answer 1 · answered Aug 11 '23 at 08:09

You should be able to do what you want by creating a custom handler for an inference endpoint. Have a look at custom handlers documentation.

You should also be able to run the encoder, decoder, and inference on the same inference endpoint by following the example here. You would pass an argument to the endpoint e.g.:

{
  "inputs": "It is so cool that I can encode, decode, and infer on the same endpoint.",
  "function": "encode"
}

How to use HuggingFace Inference endpoints for both tokenization and inference?

1 Answers1