I am a Tensorflow enthusiast and I am trying to export a model (developed in Python and then frozen and optimized with the Tensorflow tools) for the usage (just for inference) within a C++ project. What I have experienced is that, even following all the prescriptions found in other issues opened already by other users, the C++ executable I obtain after compiling the source is much slower in the inference operation (I mean session->run) by a factor of 10 compared to the same operation in a Python inference code.
I am aware of different issues opened on this topic. Following those I built the C++ project using the following command:
bazel build -c opt --copt=-mfma --copt=-mfpmath=both //tensorflow/project:project
I tried also to use the same batch size for the inference tensor as used for training, but I still experience the same worsening of magnitude 10 in time performance for the session->run operation.
I am aware of the fact that in principle, C++ implementation should be faster than Python's one (just because Python is higher level than C++), so this effect is in my opinion counterintuitive. My question is whether I am doing something wrong or this is just a feature of Tensorflow.
Another question: googling around the web, I could find out that freezing graphs has the effect of slowing down the inference process (I might be wrong on that), but I couldn't figure out an alternative way of loading a graph within a C++ code instead of the frozen one (anyway, freezing or not the graph has no effect on Python's performance). Perhaps somebody could also explain whether other options are available at the moment.
Thank you very much in advance for all your kind suggestion and thank you for the outstanding job with Tensorflow.