1

os system: macos Catalina 10.15.2
xcode: 11.3
coreml3.0

I give same model input to the same mlmodel. But the inference results are different using cpu device and gpu device.

The results are as follows, the left file is inference result (second column) using cpu and the right file is inference result (second column) using CpuAndGpu. I use beyond compare to compare the two files, and the data marked with red color are the difference.

Does anyone know about the problem and how to fix it?

enter image description here

1 Answers1

1

This is not a problem per se. On the GPU, 16-bit floats are used while on the CPU 32-bit floats are used. 16-bit floats have less precision, which explains the different results you're getting.

Some numbers will be slightly larger, some will be slightly smaller, but generally these effects cancel out and you won't notice the difference.

(However, if your model generates images, you may notice pixel artifacts from the lower precision provided by the 16-bit floats.)

Matthijs Hollemans
  • 7,706
  • 2
  • 16
  • 23
  • I have another two questions. Firstly, I use coremltools to do the model 's 16bit quantization.But the inference speed on device is not improved. Only the model size is reduced. Secondly, some detection model may lose a lot of precision using coremltools' 16bit quantizaton.But the models running on device(gpu-16bit) don't lose precision. So why gpu(or npu)-16bit on device is better than quantization-16bit using coremltools? May be the difference is the method they used to quantify the mlmodel? Thanks @Matthijs Hollemans – YiZhaoYanBo Apr 15 '20 at 14:27
  • 1
    Quantization only affects the way the weights are stored in the model, not what happens during runtime. On the GPU/ANE, the network always runs with 16-bit floats, no matter how the weights are quantized (or not). – Matthijs Hollemans Apr 15 '20 at 16:39