2

I was working on the optimization of the T5 model I separated the model into encoder and decoder and converted them to ONNX using Nvidia TensorRT repo https://github.com/NVIDIA/TensorRT/tree/main/demo/HuggingFace but I am unable to make an inference. The model, I used is a QA model based on T5 and its prediction is done using generate method. Hence is there any way by which we can generate using T5 without using generate method?.

0 Answers0