How to fix: on Android TensorFlow Lite inference is much slower than standard TensorFlow inference

Question

I developed and trained a convolutional neural network with TensorFlow and Keras. Now, I want to deploy this model to an Android device, where I need it for a real-time application.

I found two ways to deploy a Keras model to Android:

Freeze the graph as a .pb-file (e.g., 'model.pb'), and then use a "TensorFlowInferenceInterface" on the Android device.
Convert the frozen graph to a .tflite model (e.g. 'model.tflite') and then use a TesorFlow Lite Interpreter on the Android device.

Both approaches work on the Android device and yield the expected results. Yet, to my great surprise, the inference with the TensorFlow Lite interpreter takes at least twice as long as the inference with the TensorFlowInterface (on the same device, of course). I checked this on various devices, and the results are similar in all cases.

For creating the tflite-model I use the following code:

tflite_convert --graph_def_file=" + frozen_graph_name + "
--output_file=" + TFLite_file_name + " --inference_type=FLOAT 
--input_type=FLOAT --input_shape=1,768,64,1 
--input_format=TENSORFLOW_GRAPHDEF --output_format=TFLITE 
--input_arrays=input_1 --output_arrays=conv2d_10/Sigmoid" \

Alternatively, I tried the following python code

import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_keras_model_file('keras_model.h5')        
tflite_model = converter.convert()
open(TFLite_file_name, "wb").write('model.tflite')

In both cases the result was the same - the tflite inference was much slower than the TensorFlowInterface inference on all Android devices. Adding the optimization flag "OPTIMIZE_FOR_LATENCY" increased the tflite inference time by a factor of two.

I checked TensorFlow Lite quantization fails to improve inference latency, Why is TensorFlow Lite slower than TensorFlow on desktop?, and Tensorflow Object Detection inference slow on CPU but did not receive any satisfactory answers.

According to all docs I found TFLite is supposed to be much faster on Android devices. So what can I do to speed up my TFLite inference on Android? On my PC TFLite is indeed faster, which is even more surprising.

Any help is greatly appreciated!

score 0 · Answer 1 · answered May 20 '19 at 22:04

You might want to profile your model with the TFLite Benchmark tool, that gives you average inference times & even op-level latencies.

If the latency you observe is significantly larger than what the benchmark tool shows, then there might be something inefficient about your inference code. If not, then some op might be the bottleneck, and you can file a Github issue for it.

How to fix: on Android TensorFlow Lite inference is much slower than standard TensorFlow inference

1 Answers1