Using ARM Neon acceleration with TFLite C++ API on Android

Question

I am trying to utilize Neon acceleration for TFLite inference on an Android device. While this appears to be well documented and straightforward for Java, I could use help in getting started with the C++ API. I am new to this, so my apologies if the answer is obvious.

The TensorFlow Lite library contains source for Neon in https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/kernels, but I am wondering how and where to include it and use it.

The device specs are: Processor Octa-core, 2000 MHz, ARM Cortex-A75 and ARM Cortex-A53, 64-bit, 10 nm. CPU (2x2.0 GHz 360 Gold & 6x1.7 GHz Kryo 360 Silver); GPU Adreno 615.

What I've tried so far: I changed the build.gradle file from

android {
    externalNativeBuild {
        cmake {
            path file('CMakeLists.txt')
        }
}

to

android {
    defaultConfig {
        externalNativeBuild {
            cmake {
                arguments "-DANDROID_ARM_NEON=TRUE"
        }
    }
    externalNativeBuild {
        cmake {
            path file('CMakeLists.txt')
    }
}

Then, inference took just as long as before and I got the following error after inference finished:

A/libc: Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x753b9d3000 in tid 9935 (m.example.test2), pid 9935 (m.example.test2)

From what I understand NEON support already exists and is used if appropriate for CPU path. You don't mention the type of model you are processing, but [here is my answer regarding the state of NNAPI](https://stackoverflow.com/a/54558864/295004) which might be useful as it links to benchmark papers. — Morrison Chang, Oct 18 '20 at 01:16

score 0 · Answer 1 · answered Oct 21 '20 at 07:47

You can opt in to use the XNNPACK delegate, which actively uses ARM NEON optimized kernels if your CPU has it.

https://blog.tensorflow.org/2020/07/accelerating-tensorflow-lite-xnnpack-integration.html

It's much easier to enable XNNPACK with the Java / Obj-C / Swift APIs, by setting a boolean flag as explained in the blog post. If you need to use C++ directly for some reason, you could do something like this:

#include "tensorflow/lite/delegates/xnnpack/xnnpack_delegate.h"

// ...

TfLiteXNNPackDelegateOptions options = TfLiteXNNPackDelegateOptionsDefault();
// options.num_threads = <desired_num_threads>;
tflite::Interpreter::TfLiteDelegatePtr delegate(
    TfLiteXNNPackDelegateCreate(&options),
    [](TfLiteDelegate* delegate) { TfLiteXNNPackDelegateDelete(delegate); });
auto status = interpreter->ModifyGraphWithDelegate(std::move(delegate));
// check on the returned status code ...

See also how the Java API calls the C++ API internally to enable XNNPack delegate.

Using ARM Neon acceleration with TFLite C++ API on Android

1 Answers1