3

I have a model trained in keras which is a simple model trained on MNIST dataset.

What I try to do is to rewrite this model and run on FPGA device. In order to do this I want to fully understand how quantized model works.

First I converted this model with post training quantization to .tflite format and UINT8 precision (https://www.tensorflow.org/lite/performance/post_training_quantization).

So I have quantized model and accuracy is about 90%.

Now I try to get weights from quantized model and implement it in a pure python. I use this tool for visualization and to get model weights: https://github.com/lutzroeder/netron.

Although simple python code (matrix multiplication, add bias and relu) works, the one with quantized weights doesn't work.

So my question is how to write a feed forward using numpy?

My model in keras looks like this:

model = Sequential()
model.add(Dense(512, input_shape=input_shape))
model.add(Activation(tf.nn.relu))
model.add(Dense(100))
model.add(Activation(tf.nn.relu))
model.add(Dense(num_classes))
model.add(Activation(tf.nn.softmax))
model.compile(
    optimizer=Adam(),
    loss='categorical_crossentropy',
    metrics=['accuracy'],
)

I converted it with TocoConverter. And it works in tensorflow.

Then I try to write feed forward in pure python:

for img, label in zip(x_test, y_test):
    img = img.astype('uint8')
    total_seen += 1
    label = tf.keras.utils.to_categorical(label, num_classes=num_classes)
    X = img.reshape(1, 784)
    z1 = np.dot(X, W0.T) + b0
    a1 = relu(z1)
    z2 = np.dot(a1, W1.T) + b1
    a2 = relu(z2)
    z3 = np.dot(a2, W2.T) + b2
    prediction = np.argmax(z3)
    label = np.argmax(label)
    if prediction == label:
        num_correct += 1

But this model accuracy is about 10%, so something goes wrong. How to correct this model?

Thanks in advance.

Edit: I've read paper about quantization in tensorflow: http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf

And I know almost everything, I know what are S and Z values for activations and kernels. But after matrix multiplication it should be multiplied by factor: M :=S1*S2/S3. And i don't know what is S3 scale and how to get it. Because i can't see anything related in netron graph. Any suggestion?

Damian
  • 31
  • 4
  • Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at. – E.Coms Nov 21 '18 at 22:24
  • Did you manage to implement the model on FPGA? I am trying to do the same, but cannot figure out proper calculations flow. – Nazar Dec 05 '19 at 19:21

1 Answers1

0

There are two steps you'll need to do:

  1. Dequantize the input, weights and bias back into full precision (or integer equivalent)

    (w-w_offset)*w_scale

  2. After the Relu, quantize the activations back into integer

    a/a_scale+a_offset

    You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.

You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.

SoonYau
  • 111
  • 4