2

We are using tensorflow C API version 1.13.1 from here https://www.tensorflow.org/install/lang_c

Our neural network model frozen_graph.pb size is 230mb with MobileNet architecture. When we loading it, tensorflow allocates about 1.1 gb on first session run, and than memory allocation reduces to ~900mb and remains in this value.

We tried graph transform tool from here https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md to optimize graph. But only quantization seems to be effective to reduce also model size, also memory usage.But occasionally we can't use it as quantization reduces model accuracy by 15%.

As we currently think the only way to reduce model size and do not drastically affect accuracy is.

1)Move to another back end like MXNET etc...

2)Use knowledge distillation technique to retrain small models. https://arxiv.org/pdf/1503.02531.pdf

We are expecting that memory allocation for single model do not exceeds 150% of binary size. Any solution acceptable. Thank you.

user10333
  • 331
  • 1
  • 9
  • *"We are expecting that memory allocation for single model do not exceeds 150% of binary size"* - Why? – Holt May 27 '19 at 16:41
  • Because we have to support multiple instance of same model in server so we have to diminish memory as much as possible – user10333 May 28 '19 at 06:06
  • I am not asking why you want to reduce the model memory consumption, but why you assume that the model in memory should be at most 150% the size of the model in its binary form? – Holt May 28 '19 at 07:05
  • Because i wrote my small back end in c++ and there memory usage not exceeds 140% of binary model size. – user10333 May 28 '19 at 11:58
  • Also mxnet is the same not more then 170% memory allocated for models – user10333 Jun 11 '19 at 08:00

0 Answers0