I'm trying to run half precision inference with a model natively written in TensorRT C++ API (not parsed from other frameworks e.g. caffe, tensorflow); To the best of my knowledge, there is no public working example of this problem; the closest thing I found is the sampleMLP sample code, released with TensorRT 4.0.0.3, yet the release notes say there is no support for fp16;
My toy example code can be found in this repo. It contains API-implemented architecture and inference routine, plus the python script I use to convert my dictionary of trained weights to the wtd TensorRT format.
My toy architecture only consists of one convolution; the goal is to obtain similar results between fp32 and fp16, except for some reasonable loss of precision; the code seems to work with fp32, whereas what I obtain in case of fp16 inferencing are values of totally different orders of magnitude (~1e40); so it looks like I'm doing something wrong during conversions;
I'd appreciate any help in understanding the problem.
Thanks,
f