I've produced a model using Azure Custom Vision and exported it as "Tensorflow - SavedModel". The model is being used locally using the helper code that was included in the export. It is slightly modified though to read from a live video feed capture using OpenCV VideoCapture()/Read() in a loop.
My application is making good detection against the live video feeds as I can see the results output correctly to the console, but I'm having trouble getting accurate bounding boxes to display correctly on the output video stream. The console output is showing the array of results from the Azure Custom Vision's model predictions and I can see the array of bounding box coordinates in what appears to be normalized values.
Prior to using Azure Custom Vision, I was able to use existing models from the "Model Zoo" and the Object Detection API python visualization helpers would display bounding boxes correctly on the feed's display.
However, the coordinates returned from Azure Custom Vision appear to be "different" than the ones returned by the default COCO SSD models?
I need to convert the bounding box coordinates returned from Azure Custom Vision to values understood by the Tensorflow Object Detection API visualization helpers.
Original Code using Object Detection API and COCO SSD Model (works!):
output_dict = run_inference_for_single_image(image_resize, graph)
#Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
instance_masks=output_dict.get('detection_masks'),
use_normalized_coordinates=True,
line_thickness=2)
cv2.imshow('object_detection', cv2.resize(image_np, (640, 480)))
Azure Custom Vision version not displaying boxes correctly:
image_resize = cv2.resize(image_np, (512, 512))
predictions = od_model.predict_image(Image.fromarray(image_resized))
if len(predictions) > 0:
print(predictions)
output_dict = {}
output_dict['detection_boxes'] = []
output_dict['detection_boxes'] = [???] <-- Populate with compatible shape!!!?????
output_dict['detection_scores'] = np.asarray([ sub['probability'] for sub in predictions ])
output_dict['detection_classes'] = np.asarray([ sub['tagId'] for sub in predictions ])
output_dict['detection_class_names'] = np.asarray([ sub['tagName'] for sub in predictions ])
vis_util.visualize_boxes_and_labels_on_image_array(image_np,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
instance_masks=output_dict.get('detection_masks'),
use_normalized_coordinates=True,
line_thickness=2)
--> Console showing Custom Vision Model Response:
{'probability': 0.95146583, 'tagId': 1, 'tagName': 'MyObject', 'boundingBox': {'left': 0.11083871, 'top': 0.65143364, 'width': 0.05332406, 'height': 0.04930339}}
{'probability': 0.92589812, 'tagId': 0, 'tagName': 'OtherObject', 'boundingBox': {'left': 0.24750886, 'top': 0.68784532, 'width': 0.54308632, 'height': 0.17839652}}
Using the Azure Custom Vision model I can't seem to get bounding boxes to display correctly. I was able to convert the Custom Vision "boundingBox" to the same shape expected by the visualization, but the box would never be in the correct coordinates. I had thought it might be because the coordinate systems between the COCO SSD return tensor and the one in the Custom Vision prediction response tensors are calculated differently? Or maybe the coordinates are in a different order between the two shapes?
Has anyone already solved this translation? Am I doing it wrong? Thanks in Advance!