Tensorflow Lite Android: Both GPU delegate and NNAPI delegate are slower than CPU

Question

I am currently evaluating and comparing the performance of some tensorflow models on different smartphones. I am testing the MNIST and CIFAR10 databases. The strange thing is, when i try to speed up the inference times with hardware acceleration, they always perform worse than before. For example, these are the results I got on a Galaxy Note 20 Ultra, which definitely has got some powerful GPUs and NPUs (all results are milliseconds per inference):

MNIST CPU: 0.040
MNIST GPU: 2.322
MNIST NNAPI: 2.839

CIFAR10 CPU: 0.810
CIFAR10 GPU: 8.120
CIFAR10 NNAPI: 6.608

I warmed up the processing unit before the benchmark and executed the inferences multiple times, so these are averages and should not be random. Below you can see the code I used to configure the NNAPI or GPU-API of TensorFlow-Lite:

val model = loadModelFile(assetManager, modelPath)
val compatList = CompatibilityList()
var nnApiDelegate: NnApiDelegate? = null

val options = Interpreter.Options().apply{
    if (USE_NNAPI && Build.VERSION.SDK_INT >= Build.VERSION_CODES.P) {
        nnApiDelegate = NnApiDelegate()
        this.addDelegate(nnApiDelegate)
    }
    else if(USE_GPU && compatList.isDelegateSupportedOnThisDevice){
        val delegateOptions = compatList.bestOptionsForThisDevice
        this.addDelegate(GpuDelegate(delegateOptions))
    } else {
        // if the GPU is not supported, run on 4 threads
        this.setNumThreads(4)
    }
}

val interpreters = Interpreter(model, options)

Does anybody know what could be the reason for this or how to fix that? Thanks in advance for any tips or clues!

EDIT: Input size MNIST: 24 x 24 x 255 Input size CIFAR10: 32 x 32 x 3 x 255

I measure the inference times by measuring the time of performing an inference a few thousand times on the device and then I calculate the average afterwards.

You don't mention size of inputs or how you are getting the inference time. More of a FYI: https://ai-benchmark.com/index.html There is a link in the upper right of the site to [their research papers](https://ai-benchmark.com/research.html). — Morrison Chang, Jul 26 '22 at 21:27

score 1 · Answer 1 · answered Aug 10 '22 at 00:03

It seems that both models are already performing well on CPU with the inference latency < 1ms.

Accelerators are not always faster than CPU. Often, there is some overhead when accessing the accelerators. Also, accelerators could run certain models / operators really well, but they may not support all the operators that the CPU supports. Additionally, CPU might simply be as-fast or even faster if the performance of a model is memory-bound.

It might worth trying with some larger vision models, e.g. mobilenet_v1_1.0_224 and see if there is a speedup with GPU or other accelerators.

Tensorflow Lite Android: Both GPU delegate and NNAPI delegate are slower than CPU

1 Answers1