I have a model trained in keras and is saved as a .h5 file. The model is trained with single precision floating point values with tensorflow backend. Now I want to implement an hardware accelerator which performs the convolution operation on an Xilinx FPGA. However, before I decide on the fixed point bit width to be used on the FPGA, I need to evaluate the model accuracy by quantizing the weights to 8 or 16 bit numbers. I came across the tensorflow quantise but I am not sure how I can go about taking weights from each layer, quantise it and store it in a list of numpy arrays. After all layers are quantised, I want to set the weights of the model to the new formed quantised weights. Could someone help me do this?
This is what I have tried so far to reduce precision from float32 to float16. Please let me know if this is the correct approach.
for i in range(len(w_orginal)):
temp_shape = w_orginal[i].shape
print('Shape of index: '+ str(i)+ 'array is :')
print(temp_shape)
temp_array = w_orginal[i]
temp_array_flat = w_orginal[i].flatten()
for j in range(len(temp_array)):
temp_array_flat[j] = temp_array_flat[j].astype(np.float16)
temp_array_flat = temp_array_flat.reshape(temp_shape)
w_fp_16_test.append(temp_array_flat)