0

I followed this tutorial in order to quantize my graph into 8 bit.I can't share the exact graph here but i can say it's a simple convolutional neural network.

When i run the benchmark tool over the original and quantized networks it's clear that the quantized network is much much slower (100 ms vs. 4.5 ms).

Slowest nodes in original network :

time average [ms]   [%] [cdf%]  [Op]    [Name]
1.198   26.54%  26.54%  MatMul  fc10/fc10/MatMul
0.337   7.47%   34.02%  Conv2D  conv2/Conv2D
0.332   7.36%   41.37%  Conv2D  conv4/Conv2D
0.323   7.15%   48.53%  Conv2D  conv3/Conv2D
0.322   7.14%   55.66%  Conv2D  conv5/Conv2D
0.310   6.86%   62.53%  Conv2D  conv1/Conv2D
0.118   2.61%   65.13%  Conv2D  conv2_1/Conv2D
0.105   2.32%   67.45%  MaxPool pool1

Slowest nodes in quantized network :

time average [ms]   [%] [cdf%]  [Op]    [Name]
8.289   47.67%  47.67%  QuantizedMatMul fc10/fc10/MatMul_eightbit_quantized_bias_add
5.398   5.33%   53.00%  QuantizedConv2D conv5/Conv2D_eightbit_quantized_conv
5.248   5.18%   58.18%  QuantizedConv2D conv4/Conv2D_eightbit_quantized_conv
4.981   4.92%   63.10%  QuantizedConv2D conv2/Conv2D_eightbit_quantized_conv
4.908   4.85%   67.95%  QuantizedConv2D conv3/Conv2D_eightbit_quantized_conv
3.167   3.13%   71.07%  QuantizedConv2D conv5_1/Conv2D_eightbit_quantized_conv
3.049   3.01%   74.08%  QuantizedConv2D conv4_1/Conv2D_eightbit_quantized_conv
2.973   2.94%   77.02%  QuantizedMatMul fc11/MatMul_eightbit_quantized_bias_add

What is the reason for that ? I'm using tensorflow version compiled from source, without gpu support.

yossiB
  • 89
  • 11
  • Are you running on GPU? If you are, the float graph will be placed on GPU resulting in a speedup, but Quantized ops currently don't have GPU implementations so they will be placed on CPU resulting in a slowdown. Perhaps take a look at your op placement and let us know? – suharshs Oct 23 '17 at 23:58

1 Answers1

1

https://github.com/tensorflow/tensorflow/issues/2807

Check the comments here. It seems that quantization isn't yet optimized for x86. My quantized inception resnet v2 runs slower than the original too.

Sungsu Lim
  • 13
  • 3