0

I trained my own tflite classification model having 3 classes following this tutorial and now try to test it by applying it to a video feed. Here is my inference code:

import cv2
import numpy as np
from matplotlib import pyplot as plt

from PIL import Image
import tensorflow.lite as tflite

Model_Path = "/path/to/model.tflite"

labels = ["class1", "class2", "class3"]

##Load tflite model and allocate tensors
interpreter = tflite.Interpreter(model_path=Model_Path)
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

input_shape = input_details[0]["shape"]

vid_file = "/path/to/video.mp4"
# Create a VideoCapture object and read from input file
cap = cv2.VideoCapture(vid_file)

while cap.isOpened():
    _, frame = cap.read()
    cv_image = preprocess(frame)

    ##Converting image into tensor
    image = np.array(cv_image, dtype=np.float32)
    input_tensor = np.array(np.expand_dims(image, 0))

    interpreter.set_tensor(input_details[0]["index"], input_tensor)
    interpreter.invoke()
    output_details = interpreter.get_output_details()
    output_data = interpreter.get_tensor(output_details[0]["index"])
    pred = np.squeeze((output_data))
    classi = np.argmax(pred)

    # write prediction in the corner
    cv2.putText(
        frame,
        labels[classi],
        (10, 50),
        cv2.FONT_HERSHEY_SIMPLEX,
        1,
        (255, 255, 255),
        2,
        cv2.LINE_AA,
    )

    cv2.namedWindow("cv_image", cv2.WINDOW_NORMAL)
    cv2.imshow("cv_image", frame)

    ##Use p to pause the video and use q to termiate the program
    key = cv2.waitKey(1) & 0xFF
    if key == ord("q"):
        break
    elif key == ord("p"):
        cv2.waitKey(0)
        continue

cap.release()
out.release()
cv2.destroyAllWindows()

with preprocess() defined as:

def preprocess(image):

    *** some image cropping, just as for training data ***

    # resize image to 224x224
    image = cv2.resize(image, (224, 224))
    new_img = image.astype(np.float32)
    new_img /= 255.0
    return image

The prediction seems to be okay using argmax, but if I look at the confidence values, they are all negative (most of the time):

[-2.3782427 -1.6677225 -3.0637422]
[-2.4214256 -1.2143787 -3.4843316]
[-1.6566806 -2.1574929 -3.1999807]
[-1.9782547 -2.7043173 -2.0971687]

This is quite problematic, because on one hand it makes me doubt that everything works really as it should, and on the other I cannot have any post-processing logic to rule out false positives (like 2 classes with more than 50% or so).

Does anyone know what the issue could be? Previously I made the mistake that the preprocessing didn't normalise the image as done in the training. Could I still have a difference that I don't see?

André
  • 142
  • 1
  • 15

0 Answers0