1

I'm trying to create a custom object detection model in tflite format so that I can use it in a flutter application with the google_mlkit_object_detection package.

The first few models, I created using yolov8 and converted to tflite. The annotations were made with roboflow and the google colab notebook I used was provided by them, the metadata of the converted tflite model looks like the following image

enter image description here

On this model I was getting the error

Input tensor has type kTfLiteFloat32: it requires specifying NormalizationOptions metadata to preprocess input images.

So as suggested I tried to change the metadata and add normalizationOptions but failed to do so. My second alternative was to train a model with the official TensorFlow google colab notebook TensorFlow Lite Model Maker and it generated a model with the following metadata

enter image description here

For this model the error was

Unexpected number of dimensions for output index 1: got 3D, expected either 2D (BxN with B=1) or 4D

So I checked the model from the example app from the package I am using "google_mlkit_object_detection" and the metadata looks like this

enter image description here

So my question is, how can I alter the models I already trained whichever it is easier, to look like this, both input and output, do I have to alter my model's architecture or just the metadata? The second one trained with the official notebook from tensor flow, it seems that all I have to do is include the correct shape format [1,N], but again I might have to change the architecture.

2 Answers2

1

The Custom Models page on the Google documentation says this:

Note: ML Kit only supports custom image classification models. Although AutoML Vision allows training of object detection models, these cannot be used with ML Kit.

So there it is, you can use custom object detection models, but only if they are image classification models. I thought this must be impossible, an object detection model outputs bounding boxes, while a classification model outputs class scores. However, I tried with the YOLOv8 model and the standard object detection model wouldn't work, but the classification model with the [1, 1000] output shape actually works with the Google MLKit example application and you can extract the bounding boxes from it.

I'm not 100% sure how this can work, but what I suspect is that there is a default object detector bundled with the package, which identifies where there could be objects, and then you can only modify the classification model on top of it.

Anyways the simple answer is: Use a classification model with a [1, N] or [1, 1, 1, N] output where N is the number of classes. If you have a model with a different architecture, then you should change the output to this format, otherwise it is not supposed to work.

0

Metadata is just for providing the information of the model. Beside adding metadata, you need to make sure your model really meet the requirements.

More details about how to get such a model can be found here: https://developers.google.com/ml-kit/custom-models

The ML Kit Object Detection custom model is a classifier model.

Steven
  • 321
  • 1
  • 7
  • As I mentioned on my answer I followed the official TensorFlow google colab notebook and used the EfficientDet0 architecture which should have generated a metadata that looks like [this](https://tfhub.dev/tensorflow/efficientdet/lite0/detection/1), but the shapes are empty, so I guess it does meet the requirements, all I need to do is add the metadata, but I can't find a guide on how to do so. – gustavo martins Apr 24 '23 at 02:14
  • Does these links help on adding metadata? https://developers.google.com/ml-kit/custom-models#metadata – Steven Apr 26 '23 at 05:00
  • Hey Steven, from the link you sent on how to add metadata I found the model mobilenet_v1_0.75_160_quantized, and it looks identical to the metadata from the example app that I mentioned in my question. But then I realized that it was a classification model. I am very confused since the package is designed for object detection and won't accept any outputs that have dimensions different from 2 or 4, and most architectures have 3 or 1, so I don't know what to do. – gustavo martins May 05 '23 at 09:17