Tensorflow lite model is slower than its regular model during inference. Why?

Question

I have a regular model and I use tf.lite.TFLiteConverter.from_keras_model_file to convert it to .tflite model. Ane then I use interpreter to do the inference of images.

tf.logging.set_verbosity(tf.logging.DEBUG)
interpreter = tf.lite.Interpreter(model_path)
interpreter.allocate_tensors()
input_index = interpreter.get_input_details()[0]["index"]
output_index= interpreter.get_output_details()[0]["index"]
for loop:
    (read image)
    interpreter.set_tensor(input_index, image)
    interpreter.invoke()
    result = interpreter.get_tensor(output_index)

With the regular model, I use following to do the prediction.

model = keras.models.load_model({h5 model path}, custom_objects={'loss':loss})
for loop:
    (read image)
    result = model.predict(image)

However, the elapsed time on inference .tflite model is much longer than the regular. I also try the post-training quantization on the .tflite, but this model is the slowest one compared with the other two. Does it make sense? Why this happens? Is it any way to make the tensorflow lite model faster than its regular one? Thanks.

Are you running this on computer (e.g. desktop/laptop) or mobile devices (e.g. Android/iOS)? — miaout17, Jan 17 '19 at 18:54
Possible duplicate of [Why is TensorFlow Lite slower than TensorFlow on desktop?](https://stackoverflow.com/questions/54093424/why-is-tensorflow-lite-slower-than-tensorflow-on-desktop) — miaout17, Jan 17 '19 at 19:20
@miaout17 Thank you for the explanation in another question. I have one question here: you said "If SSE is available, it will try to use NEON_2_SSE to adapt NEON calls to SSE". Does it mean that the inference on computer is slower because its transferring from NEON_2_SSE may costs longer time? — debug_all_the_time, Jan 18 '19 at 21:50
@miaout17 I also have another question about using `setNumThreads`. I have two .tflite models, one is regular .tflite and the other is a post-training quantized one. And I set num of threads to be 4. However, only the inference of regular .tflite is sped up. The inference of the quantized model is still slow. Do you know why it happens? Thank you! — debug_all_the_time, Jan 18 '19 at 22:01

Tensorflow lite model is slower than its regular model during inference. Why?

0 Answers0