I am currently evaluating and comparing the performance of some tensorflow models on different smartphones. I am testing the MNIST and CIFAR10 databases. The strange thing is, when i try to speed up the inference times with hardware acceleration, they always perform worse than before. For example, these are the results I got on a Galaxy Note 20 Ultra, which definitely has got some powerful GPUs and NPUs (all results are milliseconds per inference):
MNIST CPU: 0.040
MNIST GPU: 2.322
MNIST NNAPI: 2.839
CIFAR10 CPU: 0.810
CIFAR10 GPU: 8.120
CIFAR10 NNAPI: 6.608
I warmed up the processing unit before the benchmark and executed the inferences multiple times, so these are averages and should not be random. Below you can see the code I used to configure the NNAPI or GPU-API of TensorFlow-Lite:
val model = loadModelFile(assetManager, modelPath)
val compatList = CompatibilityList()
var nnApiDelegate: NnApiDelegate? = null
val options = Interpreter.Options().apply{
if (USE_NNAPI && Build.VERSION.SDK_INT >= Build.VERSION_CODES.P) {
nnApiDelegate = NnApiDelegate()
this.addDelegate(nnApiDelegate)
}
else if(USE_GPU && compatList.isDelegateSupportedOnThisDevice){
val delegateOptions = compatList.bestOptionsForThisDevice
this.addDelegate(GpuDelegate(delegateOptions))
} else {
// if the GPU is not supported, run on 4 threads
this.setNumThreads(4)
}
}
val interpreters = Interpreter(model, options)
Does anybody know what could be the reason for this or how to fix that? Thanks in advance for any tips or clues!
EDIT: Input size MNIST: 24 x 24 x 255 Input size CIFAR10: 32 x 32 x 3 x 255
I measure the inference times by measuring the time of performing an inference a few thousand times on the device and then I calculate the average afterwards.