I have optimized my deep learning model with TensorRT. A C++ interface is inferencing images by optimized model on Jetson TX2. This interface is providing average 60 FPS (But it is not stable. Inferences are in range 50 and 160 FPS). I need to run this system as real time on real time patched Jetson.
So what is your thoughts on real time inference with TensorRT? Is it possible to develop real time inferencing system with TensorRT and how?
I have tried set high priorities to process and threads to provide preemption. I expect appoximatly same FPS value on every inference. So I need deterministic inference time. But system could not output deterministicaly.