I am trying to understand the output of the tflite Iris landmarks model available from mediapipe.
The model card describes the output as 71 2D landmarks and 5 2D landmarks. When inspecting the model as follows:
interpreter = tf.lite.Interpreter(model_path='iris_landmark.tflite')
interpreter.allocate_tensors()
output_details = interpreter.get_output_details()
print(output_details)
[{'dtype': numpy.float32,
'index': 384,
'name': 'output_eyes_contours_and_brows',
'quantization': (0.0, 0),
'quantization_parameters': {'quantized_dimension': 0,
'scales': array([], dtype=float32),
'zero_points': array([], dtype=int32)},
'shape': array([ 1, 213], dtype=int32),
'shape_signature': array([ 1, 213], dtype=int32),
'sparsity_parameters': {}},
{'dtype': numpy.float32,
'index': 385,
'name': 'output_iris',
'quantization': (0.0, 0),
'quantization_parameters': {'quantized_dimension': 0,
'scales': array([], dtype=float32),
'zero_points': array([], dtype=int32)},
'shape': array([ 1, 15], dtype=int32),
'shape_signature': array([ 1, 15], dtype=int32),
'sparsity_parameters': {}}]
I see 213 values and 15 values in the model outputs - so I assume I am getting an x/y/z coordinate for each point. After running the model on an image I get values in the -7000 to +7000 range. My input was a 64x64 image, any idea of how these points correspond to the original image?
I would like to have pixel coordinates of the eye keypoints, which are rendered in the mediapipe examples.