I trained my own tflite classification model having 3 classes following this tutorial and now try to test it by applying it to a video feed. Here is my inference code:
import cv2
import numpy as np
from matplotlib import pyplot as plt
from PIL import Image
import tensorflow.lite as tflite
Model_Path = "/path/to/model.tflite"
labels = ["class1", "class2", "class3"]
##Load tflite model and allocate tensors
interpreter = tflite.Interpreter(model_path=Model_Path)
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_shape = input_details[0]["shape"]
vid_file = "/path/to/video.mp4"
# Create a VideoCapture object and read from input file
cap = cv2.VideoCapture(vid_file)
while cap.isOpened():
_, frame = cap.read()
cv_image = preprocess(frame)
##Converting image into tensor
image = np.array(cv_image, dtype=np.float32)
input_tensor = np.array(np.expand_dims(image, 0))
interpreter.set_tensor(input_details[0]["index"], input_tensor)
interpreter.invoke()
output_details = interpreter.get_output_details()
output_data = interpreter.get_tensor(output_details[0]["index"])
pred = np.squeeze((output_data))
classi = np.argmax(pred)
# write prediction in the corner
cv2.putText(
frame,
labels[classi],
(10, 50),
cv2.FONT_HERSHEY_SIMPLEX,
1,
(255, 255, 255),
2,
cv2.LINE_AA,
)
cv2.namedWindow("cv_image", cv2.WINDOW_NORMAL)
cv2.imshow("cv_image", frame)
##Use p to pause the video and use q to termiate the program
key = cv2.waitKey(1) & 0xFF
if key == ord("q"):
break
elif key == ord("p"):
cv2.waitKey(0)
continue
cap.release()
out.release()
cv2.destroyAllWindows()
with preprocess() defined as:
def preprocess(image):
*** some image cropping, just as for training data ***
# resize image to 224x224
image = cv2.resize(image, (224, 224))
new_img = image.astype(np.float32)
new_img /= 255.0
return image
The prediction seems to be okay using argmax, but if I look at the confidence values, they are all negative (most of the time):
[-2.3782427 -1.6677225 -3.0637422]
[-2.4214256 -1.2143787 -3.4843316]
[-1.6566806 -2.1574929 -3.1999807]
[-1.9782547 -2.7043173 -2.0971687]
This is quite problematic, because on one hand it makes me doubt that everything works really as it should, and on the other I cannot have any post-processing logic to rule out false positives (like 2 classes with more than 50% or so).
Does anyone know what the issue could be? Previously I made the mistake that the preprocessing didn't normalise the image as done in the training. Could I still have a difference that I don't see?