2

I have trained an object detection CoreML model using Microsoft's customvision.ai service. I exported it to use in my app to recognize certain objects in real time using the camera. However the CoreML model outputs a MultiArray of type double. I have no idea how to decipher or use this data as it is my first time working with Multidimensional Arrays. I have been trying to find out what a custom vision object detection model is supposed to output (such as a CGRect or a UIImage) so I know what I am trying to convert my MultiArray to, but cannot find this information anywhere on Microsofts's website. Microsoft seems to have a demo app for image classification models but nothing for object detection models.

To get a sense of what might be in the multidimensional array I have tried printing it out and get this result...

Double 1 x 1 x 40 x 13 x 13 array

I have also tried printing the .strides element of the multidimensional array and got this...

[6760, 6760, 169, 13, 1]

I don't know if this info is actually useful, just wanted to give you guys everything I have done so far.

So, my question is what information does this MultiArray hold (is it something like a UIImage or CGRect?, or something different?) and how can I convert this Multidimensional Array into a useful set of data that I can actually use?

coder
  • 381
  • 2
  • 22

3 Answers3

1

I haven't used the customvision.ai service, but I've worked with object detection models before. The 13x13 array is most likely a grid that covers the input image. For each cell in this array -- usually corresponding to a block of 32x32 pixels in the original image -- there is a prediction of 40 numbers.

It depends a little on what sort of model customvision.ai uses what those 40 numbers mean. But typically they contain coordinates for one or more bounding boxes as well as class probabilities.

In case the model is YOLO (which seems likely, as that also has a 13x13 output grid) there are multiple predictions per cell. Each prediction has 4 numbers to describe a bounding box, 1 number to describe the probability this bounding box contains an object, and num_classes numbers with the probabilities for the different classes.

So there are (5 + num_classes) x num_predictions numbers per grid cell. If the model makes 5 predictions per grid cell and you have trained on 3 classes, you get (5 + 3)*5 = 40 numbers per grid cell.

Note that I'm making a lot of assumptions here because I don't know anything about your model type and how many classes of objects you trained on.

Those 40 numbers may or may not have real bounding box coordinates yet. You may need to write additional code to "decode" these numbers. Again, the logic for this depends on the model type.

I'd assume customvision.ai has some documentation or sample code on how to do this.

You can also read more about this topic in several of my blog posts:

Matthijs Hollemans
  • 7,706
  • 2
  • 16
  • 23
1

9 months later, I stumbled upon your question while trying to solve this exact problem. Having found the solution today, I thought I'd post it up.

Have a look at this github sample.

https://github.com/Azure-Samples/cognitive-services-ios-customvision-sample/tree/master/CVS_ObjectDetectorSample_Swift

It makes use of a Cocoapod named MicrosoftCustomVisionMobile.

That cocoapod contains the CVSInference framework, which has a class, CVSObjectDetector, that will do all the heavy lifting of parsing the 3-dimensional MLMultiArray output for you. All you need to do is feed it the UIImage for detection and run the inference. Then, you can read the detected identifiers, their bounding boxes and confidences using the strongly typed properties of CVSObjectDetector. Make sure you transform the coordinates back to your view space before drawing!

If you are working in Xamarin like me, you could use sharpie to create a C# binding for the pod and you'll be in business.

1

That's quite a late answer but I faced the same issue and that's my solution. You should get your prediction with something similar:

guard let modelOutput = try? model.prediction(input: modelInput) else {
    fatalError("Unexpected runtime error.")
}

Then, based on your output name defined in your model (here the name is "Identity"): enter image description here

You should be able to access the data in the Multidimensional Array like so:

for i in 0..<apoiOutput.Identity.count {
    print(modelOutput.Identity[i].floatValue)
}
Dharman
  • 30,962
  • 25
  • 85
  • 135
MrPOHB
  • 31
  • 4