I am trying to use the quantized model with Tensorflow Lite Micro, and got a segmentation error inside interpreter->Invoke() call.
Debugger showed that segmentation error occurred on returning from Eval() in conv.cc on Node 28 of CONV_2D, and stack was corrupted. Error message is *** stack smashing detected ***: <unknown> terminated
with compiler flags "-fstack-protector-all -Wstack-protector".
My test was simply from the person detection example with model replaced with Mobilenet_V1_0.25_224_quant at the Tensorflow lite pre-trained models site with increased enough kTensorArenaSize and model input/output size changed to 224x224x3 and 1x1001, amd pulled additional required operators.
Also tried a few different models, at another quantified mode Mobilenet_V1_0.25_192_quant is showing the same segfault problem, But the regular floating point modes Mobilenet_V1_0.25_192, and Mobilenet_V1_0.25_224 run OK with many loops.
Have anyone seen similar problem ? Or is some limitations on Tensorflow Lite Micro that I should be aware of ?
This problem can be reproduced at this commit of forked tensorflow repo.
Build command:
$ bazel build //tensorflow/lite/micro/examples/person_detection:person_detection -c dbg --copt=-fstack-protector-all --copt=-Wstack-protector --copt=-fno-omit-frame-pointer
And run:
$ ./bazel-bin/tensorflow/lite/micro/examples/person_detection/person_detection
Files changed:
tensorflow/lite/micro/examples/person_detection/main_functions.cc
tensorflow/lite/micro/examples/person_detection/model_settings.h
tensorflow/lite/micro/examples/person_detection/person_detect_model_data.cc
Changes in main_functions.cc:
constexpr int kTensorArenaSize = 1400 * 1024;
static tflite::MicroOpResolver<5> micro_op_resolver;
micro_op_resolver.AddBuiltin(tflite::BuiltinOperator_RESHAPE,
tflite::ops::micro::Register_RESHAPE());
micro_op_resolver.AddBuiltin(tflite::BuiltinOperator_SOFTMAX,
tflite::ops::micro::Register_SOFTMAX(), 1, 2);
Changes in model_settings.h
constexpr int kNumCols = 224;
constexpr int kNumRows = 224;
constexpr int kNumChannels = 3;
constexpr int kCategoryCount = 1001;
The last model data file person_detect_model_data.cc is pretty big, please see full file at github.
March 28, 2020: Also tested on Raspberry Pi 3, results are same as on the x86 Ubuntu 18.04.
pi@raspberrypi:~/tests $ ./person_detection
*** stack smashing detected ***: <unknown> terminated
Aborted
Thanks for your help.
Problem root cause found - Updated on April 2, 2020:
I found that the problem is caused by an array overrun of the layer operation data. Tensorflow microlite has a hidden limit (or I missed document, at least TF microlite runtime does not check) on output channels to maximum 256 in the OpData structure of conv.cc for TF micro lite.
constexpr int kMaxChannels = 256;
....
struct OpData {
...
// Per channel output multiplier and shift.
// TODO(b/141139247): Allocate these dynamically when possible.
int32_t per_channel_output_multiplier[kMaxChannels];
int32_t per_channel_output_shift[kMaxChannels];
...
}
The mobilenet model Mobilenet_V1_0.25_224_quant.tflite is with 1000 output classes, and total of 1001 channels internally. And it caused stack corruption in tflite::PopulateConvolutionQuantizationParams() of tensorflow/lite/kernels/kernel_util.cc:90 for the last Conv2D with output size of 1001.
No problem for TF, and TF lite as they are believed not using this structure definition.
Confirmed with increasing the channels to 1024 on loops of model evaluation calls.
Although most of TF microlite cases are likely with small models, and probably won't run into this problem.
This limit may be better documented and/or to perform check at run-time ?