Mxnet on mobile GPU

Question

I want to run a Neural Network on mobile. Currently, I am exploring Mxnet (http://mxnet.io) framework for deploying it (only for Inference). As I am concerned about the execution time performance on mobile, I want to know if it runs on the GPU of mobile phones (Android/iOS). The documentation mentions that it can use multiple CPUs as well as GPUs for training, but it is still not clear if it can GPU of mobile phone for inference on mobile. It mentions about dependency on BLAS, because of which it seems it uses CPU on mobile. Could anyone please tell me if I can use mobile GPU with mxnet for inference? If not, what are my other options?

kris larson · Accepted Answer · 2018-09-04T18:52:57.110

UPDATE: The Neural Networks API is now available on Android devices starting from API 27 (Oreo 8.1). The API provides a lower-level facility that a higher-level machine learning framework (e.g. Tensorflow, Caffe) can use to build models. It is a C-language API that can be accessed through the Android Native Development Kit (NDK).

NNAPI gives hardware vendors a Service Provider Interface (SPI) to provide drivers for computational hardware such as Graphics Processing Units (GPUs) and Digital Signal Processors (DSPs). As a result, the NNAPI provides an abstraction for high performance computation. There is a CPU fallback in case no hardware acceleration drivers are present.

For those wanting to implement a machine learning model on Android, the framework of choice is now Tensorflow Lite. Tensorflow Lite for Android is implemented on top of the NNAPI, so Tensorflow models will get hardware acceleration when available. Tensorflow Lite has other optimizations to squeeze more performance out of the mobile platform.

The process goes as follows:

Develop and train your model on Keras (using Tensorflow backend)
Or use a pretrained model
Save a "frozen" model in Tensorflow protobuf format
Use the Tensorflow Optimizing Converter to convert the protobuf into a "pre-parsed" tflite model format

See the Tensorflow Lite Developer Guide

I went through an exercise of creating a neural net application for Android using Deeplearning4j. Because Deeplearning4j is based on Java, I thought it would be a good match for Android. Based on my experience with this, I can answer some of your questions.

To answer your most basic question:

Could anyone please tell me if I can use mobile GPU with mxnet for inference?

The answer is: No. The explanation for this follows.

It mentions about dependency on BLAS, because of which it seems it uses CPU on mobile.

BLAS (Basic Linear Algebraic Subprograms) is at the heart of AI computation. Because of the sheer amount of number-crunching involved in these complex models the math routines must be optimized as much as possible. The computational firepower of GPUs make them ideal processors for AI models.

It appears that MXNet can use Atlas (libblas), OpenBLAS, and MKL. These are CPU-based libraries.

Currently the main (and — as far as I know — only) option for running BLAS on a GPU is CuBLAS, developed specifically for NVIDIA (CUDA) GPUs. Apparently MXNet can use CuBLAS in addition to the CPU libraries.

The GPU in many mobile devices is a lower-power chip that works with ARM architectures which doesn't have a dedicated BLAS library yet.

what are my other options?

Just go with the CPU. Since it's the training that's extremely compute-intensive, using the CPU for inference isn't the show-stopper you think it is. In OpenBLAS, the routines are written in assembly and hand-optimized for each CPU it can run on. This includes ARM.
Do the recognition on the server. After working on another demo application which sent an image to the server that performed the recognition and returned results to the device, I think this approach has some benefits for the user such as better overall response time and not having to find space to install a 100MB(!) application.

Since you also tagged iOS, using a C++-based framework like MXNet is probably the best choice if you are trying to go cross-platform. But their method of creating one giant .cpp file and providing a simple native interface might not give you enough flexibility for Android. Deeplearning4j has solved that problem pretty elegantly by using JavaCPP which takes the complexity out of JNI.

Mxnet on mobile GPU

1 Answers1