The model was trained with Python. I have looked into different ways but hit the wall here or there. I summarize as below and please correct me if I am wrong
+--------------------+-----+-------+-----------------+
| | C++ | FP 16 | non-fixed shape |
+--------------------+-----+-------+-----------------+
| TensorFlow C++ API | ✓ | ? | ✓ |
| TensorRT | ✓ | ✓ | X |
| TF-TRT | X | ✓ | ✓ |
+--------------------+-----+-------+-----------------+
- At least TensorRT 5.1 still doesn't support non-fixed input shape
- TF-TRT = tensorflow/tensorrt and for the moment it is only avaiable in Python
- I am aware of TensorRT Inference Server but I don't want to go into a network communication based solution if I don't have to
The "?" in the table means the interplay with Eigen::half
under tensorflow/core/kernels (e.g., inside conv_2d_gpu_half.cu.cc) to achieve FP 16 arithmetic with TensorFlow C++. I don't see many documentation on this but is it the only way to go?
(I am fine with converting my model to other frameworks like MXNet but similar limitations seem apply: just change TensorFlow C++ API→MXNet C++ Package, TensorRT→TVM, and TF-TRT→MXNet-TensorRT in the table)