optimized tensorflow graph is slower than original on android

Question

I have a tensorflow graph from darkflow on which I am running inference on an android device (on CPU Snapdragon 820). I found this graph transform tool to optimize the model for deployment. So I optimized my graph an expected to be faster than it was before, but it got slower by around 10%.

What can cause that? What am I doing wrong?

Here are the details:

I use the tiny-yolo-voc model from darkflow without modification. I created the tf model like:

$ ./flow --model cfg/tiny-yolo-voc.cfg --load bin/tiny-yolo-voc.weights --savepb --verbalise

I optimized the graph with the following command:

$ bazel-bin/tensorflow/tools/graph_transforms/transform_graph /
--in_graph=../darkflow/darkflow/built_graph/tiny-yolo-voc.pb /
--out_graph=../darkflow/darkflow/built_graph/optimized-tiny-yolo-voc.pb /
--inputs='input' --outputs='output' /
--transforms='strip_unused_nodes(type=float, shape="1,299,299,3") fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms'

My code:

InfrerenceRunner.java:

public class InferenceRunner {

    private static final String INPUT_NODE = "input";
    private static final String OUTPUT_NODE = "output";
    protected final TensorFlowInferenceInterface mInferenceInterface;
    private final int mGridSize;
    private final int mNumOfLabels;
    private int mInputSize;

    public InferenceRunner(Context context, String modelFile, int inputSize, int gridSize, int numOfLabels) {
        this.mInputSize = inputSize;
        this.mGridSize = gridSize;
        this.mNumOfLabels = numOfLabels;
        mInferenceInterface = new TensorFlowInferenceInterface(context.getAssets(), modelFile);
    }

    public synchronized void runInference(Bitmap image) {
        Trace.beginSection("imageTransform");
        Bitmap bitmap = Bitmap.createScaledBitmap(image, mInputSize, mInputSize, false);
        int[] intValues = new int[mInputSize * mInputSize];
        float[] floatValues = new float[mInputSize * mInputSize * 3];
        bitmap.getPixels(intValues, 0, bitmap.getWidth(), 0, 0, bitmap.getWidth(), bitmap.getHeight());

        for (int i = 0; i < intValues.length; ++i) {
            floatValues[i * 3 + 0] = ((intValues[i] >> 16) & 0xFF) / 255.0f;
            floatValues[i * 3 + 1] = ((intValues[i] >> 8) & 0xFF) / 255.0f;
            floatValues[i * 3 + 2] = (intValues[i] & 0xFF) / 255.0f;
        }
        Trace.endSection();

        Trace.beginSection("inferenceFeed");
        mInferenceInterface.feed(INPUT_NODE, floatValues, 1, mInputSize, mInputSize, 3);
        Trace.endSection();

        Trace.beginSection("inferenceRun");
        mInferenceInterface.run(new String[]{OUTPUT_NODE});
        Trace.endSection();

        final float[] resu =
                new float[mGridSize * mGridSize * (mNumOfLabels + 5) * 5];
        Trace.beginSection("inferenceFetch");
        mInferenceInterface.fetch(OUTPUT_NODE, resu);
        Trace.endSection();
    }
}

MainActivity:onCreate():

...
tinyYolo = new InferenceRunner(getApplicationContext(), TINY_YOLO_MODEL_FILE, TINY_YOLO_INPUT_SIZE, 13, 20);
optimizedTinyYolo = new InferenceRunner(getApplicationContext(), OPTIMIZED_TINY_YOLO_MODEL_FILE, TINY_YOLO_INPUT_SIZE, 13, 20);
...

MainActivity:onResume():

...
mHandler.post(new Runnable() {
        @Override
        public void run() {
            Trace.beginSection("TinyYoloModel");
            for (int i = 0; i < 5; i++) {
                tinyYolo.runInference(b);
            }
            Trace.endSection();

            Log.d(TAG, "run: optimized");
            Trace.beginSection("OptimizedModel");
            for (int i = 0; i < 5; i++) {
                optimizedTinyYolo.runInference(b);
            }
            Trace.endSection();
        }
    });
...

My Systrace output:

TinyYoloModel wall duration is 5,525ms
OptimizedModel duration is 6,043ms
TinyYoloModel inferenceRun avg: 1051ms
OptimizedModel inferenceRun avg: 1158ms

Do you have any idea why is the optimized model slower?

If you need more info feel free to comment! Thanks for your help.

optimized tensorflow graph is slower than original on android

0 Answers0