0

Using Apple's Create ML application(a developer tool that come with Xcode), I trained an image classification model and downloaded it. I then loaded the model as part of a python project using coremltools package:

import coremltools
import PIL.Image

def load_image(path, resize_to=None):
   img = PIL.Image.open(path)
   if resize_to is not None:
       img = img.resize(resize_to, PIL.Image.ANTIALIAS)
   r, g, b= img.split()
   img = PIL.Image.merge("RGB", (b, g, r))
   return img

model = coremltools.models.MLModel('classification1.mlmodel')
img_dir = "img1"
img = load_image(img_dir, resize_to=(299, 299))

result = model.predict({'image': img})
print(result)

This code printed a predicted classlabel different from the prediction result I got when I predict the label for img1 directly in the Create ML application. I believe the application did some adjustment to the input image before predict the class label for the image. When I print(model), I got some information about the input:

input {
name: "image"
shortDescription: "Input image to be classified"
type {
  imageType {
    width: 299
    height: 299
    colorSpace: BGR
    imageSizeRange {
      widthRange {
        lowerBound: 299
        upperBound: -1
      }
      heightRange {
        lowerBound: 299
        upperBound: -1
      }
    }
  }
 }
}

I believe I have made the required adjustment by adjusting the image size and converting the color space. Why does the prediction between the code and the application doesn't agree?

JF9199
  • 101
  • 6

2 Answers2

1

Try reading your image in OpenCV and convert to PIL.Image before passing it to MLModel.predict().

coremltools Data Structures and Feature Types documentation assumes, I think, images to be used for CoreML Vision image classification are decompressed. You can use OpenCV to encode image as decompressed PNG format. And depending on your Colorspace you may need numpy to change image bits per pixel.

import coremltools as ct
import cv2 as cv
import numpy as np
from PIL import Image


model = ct.models.MLModel('classification1.mlmodel')
img = cv.imread('path/to/image')
img = cv.cvtColor(img, cv.COLOR_BGR2RGB)
#// 32 bits per pixel: RGBA with A channel ignored
img = np.float32(img)
#// Assumes raw (decompressed) format
_, img_decompressed = cv.imencode('.png', img, params=[cv.IMWRITE_PNG_COMPRESSION, 0])
img_pil = Image.fromarray(img_decompressed)
img_pil = img_pil.resize((299, 299), resample=Image.BICUBIC)
prediction = model.predict({'image': img_pil})
Castillo
  • 11
  • 1
  • I am still unable to get the same prediction after trying out your suggestion. – JF9199 Jul 04 '20 at 10:56
  • 1
    CreateML app is to black box and their feature extractor VisionFeaturePrint_Scene is a method that for all my research cannot yet be reproduced in Python. After trying - I decided to not use CreateML and switch to Turicreate (https://github.com/apple/turicreate). It's a library from Apple for creating mlmodels but you avoid the VisionFeaturePrint_Scene feature extractor for image classification. Afterward, you can still use coremltools for predictions. I hope this helps. – Castillo Jul 05 '20 at 23:38
0

You don't need to do this:

img = PIL.Image.merge("RGB", (b, g, r))

Since the Core ML model already knows the input should the BGR, it will flip the color channels for you.

Since you already flipped them by hand, they will now get flipped twice, which is not what you wanted.

Matthijs Hollemans
  • 7,706
  • 2
  • 16
  • 23
  • I tried your suggestion. The predictions are still different – JF9199 Jun 25 '20 at 16:33
  • Make the image 299x299 pixels in an image editor, save it as PNG, then load it without resizing or any changes in Python. Does it still give very different results when compared to Create ML? I ask because things like resizing may be done differently in Python vs the Create ML app, and small differences in pixels can actually cause big differences in the predictions. – Matthijs Hollemans Jun 25 '20 at 19:41