I'm using TensorRT FP16 precision mode to optimize my deep learning model. And I use this optimised model on Jetson TX2. While testing the model, I have observed that TensorRT inference engine is not deterministic. In other words, my optimized model gives different FPS values between 40 and 120 FPS for same input images.
I started to think that the source of the non-determinism is floating point operations when I see this comment about CUDA:
"If your code uses floating-point atomics, results may differ from run to run because floating-point operations are generally not associative, and the order in which data enters a computation (e.g. a sum) is non-deterministic when atomics are used."
Is type of precision such as FP16, FP32 and INT8 affects determinism of TensorRT? Or anything?
Do you have any thoughs?
Best regards.