I have a regular model and I use tf.lite.TFLiteConverter.from_keras_model_file
to convert it to .tflite model. Ane then I use interpreter to do the inference of images.
tf.logging.set_verbosity(tf.logging.DEBUG)
interpreter = tf.lite.Interpreter(model_path)
interpreter.allocate_tensors()
input_index = interpreter.get_input_details()[0]["index"]
output_index= interpreter.get_output_details()[0]["index"]
for loop:
(read image)
interpreter.set_tensor(input_index, image)
interpreter.invoke()
result = interpreter.get_tensor(output_index)
With the regular model, I use following to do the prediction.
model = keras.models.load_model({h5 model path}, custom_objects={'loss':loss})
for loop:
(read image)
result = model.predict(image)
However, the elapsed time on inference .tflite model is much longer than the regular. I also try the post-training quantization on the .tflite, but this model is the slowest one compared with the other two. Does it make sense? Why this happens? Is it any way to make the tensorflow lite model faster than its regular one? Thanks.