I have a model trained in keras which is a simple model trained on MNIST dataset.
What I try to do is to rewrite this model and run on FPGA device. In order to do this I want to fully understand how quantized model works.
First I converted this model with post training quantization to .tflite format and UINT8 precision (https://www.tensorflow.org/lite/performance/post_training_quantization).
So I have quantized model and accuracy is about 90%.
Now I try to get weights from quantized model and implement it in a pure python. I use this tool for visualization and to get model weights: https://github.com/lutzroeder/netron.
Although simple python code (matrix multiplication, add bias and relu) works, the one with quantized weights doesn't work.
So my question is how to write a feed forward using numpy?
My model in keras looks like this:
model = Sequential()
model.add(Dense(512, input_shape=input_shape))
model.add(Activation(tf.nn.relu))
model.add(Dense(100))
model.add(Activation(tf.nn.relu))
model.add(Dense(num_classes))
model.add(Activation(tf.nn.softmax))
model.compile(
optimizer=Adam(),
loss='categorical_crossentropy',
metrics=['accuracy'],
)
I converted it with TocoConverter. And it works in tensorflow.
Then I try to write feed forward in pure python:
for img, label in zip(x_test, y_test):
img = img.astype('uint8')
total_seen += 1
label = tf.keras.utils.to_categorical(label, num_classes=num_classes)
X = img.reshape(1, 784)
z1 = np.dot(X, W0.T) + b0
a1 = relu(z1)
z2 = np.dot(a1, W1.T) + b1
a2 = relu(z2)
z3 = np.dot(a2, W2.T) + b2
prediction = np.argmax(z3)
label = np.argmax(label)
if prediction == label:
num_correct += 1
But this model accuracy is about 10%, so something goes wrong. How to correct this model?
Thanks in advance.
Edit: I've read paper about quantization in tensorflow: http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf
And I know almost everything, I know what are S and Z values for activations and kernels. But after matrix multiplication it should be multiplied by factor: M :=S1*S2/S3. And i don't know what is S3 scale and how to get it. Because i can't see anything related in netron graph. Any suggestion?