How can I let Kirin 990's NPU work on TensorFlow Lite?

Question

I successfully converted TensorFlow model to TensorFlow Lite float16 model according to Post-training float16 quantization.

The below is a diagram of the converted model.

And I ran successfully it on MatePad Pro(Kirin 990) by my C++ code.

What I wrote especially for NNAPI is SetAllowFp16PrecisionForFp32 and UseNNAPI before AllocateTensors.

m_interpreter->SetAllowFp16PrecisionForFp32(true);
m_interpreter->UseNNAPI(true);
m_interpreter->AllocateTensors();

But the performance is not good.

I checked logs by adb logcat and found that both armnn and liteadapter, which I think as Huawei's NNAPI driver, fail to support major operations such as CONV_2D and nnapi-reference, which is CPU implementation of NNAPI, executes as fallback.

The messages are like below.

AndroidNN: AnnOpConvParser::isSupport1_1(280)::"Conv para is model Input err"

Why do NNAPI drivers except for nnapi-reference fail to support operations?

And how can I fix it?

I wonder that Dequantize operations in the converted model should not be there and each operation should have float16 parameters.

I don't know my guess is right and even though it is right, I have no idea to eliminate Dequantize operations.

(And of course, I tried float32 converted model. The outputs of float32 model were quite different between SetAllowFp16PrecisionForFp32(false) and SetAllowFp16PrecisionForFp32(true).

So I concluded that you need float16 quantization for NNAPI.)

The below is summary of observation.

Assuming setUseNNAPI(true),

float32 model and SetAllowFp16PrecisionForFp32(true) let liteadapter work but the output is wrong.
float32 model and SetAllowFp16PrecisionForFp32(false) let armnn work as fallback.
float16 model and SetAllowFp16PrecisionForFp32(true or false) let nnapi-reference work as fallback.

Please give me advices!

Yuji · Answer 1 · 2020-07-24T11:00:04.877

1

I found that the reasons why it did not run on NPU were followings.

float16 quantization prevents it.
Unsupported operation may cause not only CPU fallback of the operation but also failure of whole compilation of model.

A simpler model runs on NPU without change of the code.

edited Jul 24 '20 at 11:00

answered Jun 23 '20 at 01:10

Yuji

614
4
18

How can I let Kirin 990's NPU work on TensorFlow Lite?

1 Answers1