I have a tensorflow graph from darkflow on which I am running inference on an android device (on CPU Snapdragon 820). I found this graph transform tool to optimize the model for deployment. So I optimized my graph an expected to be faster than it was before, but it got slower by around 10%.
What can cause that? What am I doing wrong?
Here are the details:
- I use the tiny-yolo-voc model from darkflow without modification. I created the tf model like:
$ ./flow --model cfg/tiny-yolo-voc.cfg --load bin/tiny-yolo-voc.weights --savepb --verbalise
- I optimized the graph with the following command:
$ bazel-bin/tensorflow/tools/graph_transforms/transform_graph /
--in_graph=../darkflow/darkflow/built_graph/tiny-yolo-voc.pb /
--out_graph=../darkflow/darkflow/built_graph/optimized-tiny-yolo-voc.pb /
--inputs='input' --outputs='output' /
--transforms='strip_unused_nodes(type=float, shape="1,299,299,3") fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms'
- My code:
InfrerenceRunner.java:
public class InferenceRunner {
private static final String INPUT_NODE = "input";
private static final String OUTPUT_NODE = "output";
protected final TensorFlowInferenceInterface mInferenceInterface;
private final int mGridSize;
private final int mNumOfLabels;
private int mInputSize;
public InferenceRunner(Context context, String modelFile, int inputSize, int gridSize, int numOfLabels) {
this.mInputSize = inputSize;
this.mGridSize = gridSize;
this.mNumOfLabels = numOfLabels;
mInferenceInterface = new TensorFlowInferenceInterface(context.getAssets(), modelFile);
}
public synchronized void runInference(Bitmap image) {
Trace.beginSection("imageTransform");
Bitmap bitmap = Bitmap.createScaledBitmap(image, mInputSize, mInputSize, false);
int[] intValues = new int[mInputSize * mInputSize];
float[] floatValues = new float[mInputSize * mInputSize * 3];
bitmap.getPixels(intValues, 0, bitmap.getWidth(), 0, 0, bitmap.getWidth(), bitmap.getHeight());
for (int i = 0; i < intValues.length; ++i) {
floatValues[i * 3 + 0] = ((intValues[i] >> 16) & 0xFF) / 255.0f;
floatValues[i * 3 + 1] = ((intValues[i] >> 8) & 0xFF) / 255.0f;
floatValues[i * 3 + 2] = (intValues[i] & 0xFF) / 255.0f;
}
Trace.endSection();
Trace.beginSection("inferenceFeed");
mInferenceInterface.feed(INPUT_NODE, floatValues, 1, mInputSize, mInputSize, 3);
Trace.endSection();
Trace.beginSection("inferenceRun");
mInferenceInterface.run(new String[]{OUTPUT_NODE});
Trace.endSection();
final float[] resu =
new float[mGridSize * mGridSize * (mNumOfLabels + 5) * 5];
Trace.beginSection("inferenceFetch");
mInferenceInterface.fetch(OUTPUT_NODE, resu);
Trace.endSection();
}
}
MainActivity:onCreate():
...
tinyYolo = new InferenceRunner(getApplicationContext(), TINY_YOLO_MODEL_FILE, TINY_YOLO_INPUT_SIZE, 13, 20);
optimizedTinyYolo = new InferenceRunner(getApplicationContext(), OPTIMIZED_TINY_YOLO_MODEL_FILE, TINY_YOLO_INPUT_SIZE, 13, 20);
...
MainActivity:onResume():
...
mHandler.post(new Runnable() {
@Override
public void run() {
Trace.beginSection("TinyYoloModel");
for (int i = 0; i < 5; i++) {
tinyYolo.runInference(b);
}
Trace.endSection();
Log.d(TAG, "run: optimized");
Trace.beginSection("OptimizedModel");
for (int i = 0; i < 5; i++) {
optimizedTinyYolo.runInference(b);
}
Trace.endSection();
}
});
...
TinyYoloModel wall duration is 5,525ms
OptimizedModel duration is 6,043ms
TinyYoloModel inferenceRun avg: 1051ms
OptimizedModel inferenceRun avg: 1158ms
Do you have any idea why is the optimized model slower?
If you need more info feel free to comment! Thanks for your help.