I was working on the optimization of the T5 model I separated the model into encoder and decoder and converted them to ONNX using Nvidia TensorRT repo https://github.com/NVIDIA/TensorRT/tree/main/demo/HuggingFace but I am unable to make an inference. The model, I used is a QA model based on T5 and its prediction is done using generate method. Hence is there any way by which we can generate using T5 without using generate method?.
Asked
Active
Viewed 493 times