full-quatization does not except int8 data to change model input layer to int8

Question

I am quantizing a keras h5 model to uint8. To get full uint8 quatization, user dtlam26 told me in this post that the representative dataset should already be in uint8, otherwise the input layer is still in float32.

The problem is, that if I feed uint8 data I receive following error during the call converter.convert()

ValueError: Cannot set tensor: Got tensor of type INT8 but expected type FLOAT32 for input 178, name: input_1

It seems, that the model still expects float32. So I checked the base keras_vggface pretrained model (from here) with

from keras_vggface.vggface import VGGFace
import keras

pretrained_model = VGGFace(model='resnet50', include_top=False, input_shape=(224, 224, 3), pooling='avg')  # pooling: None, avg or max

pretrained_model.save()

and the resulting h5 model hast the input layer with float32. Next, I changed the model definition using uint8 as input dtype:

def RESNET50(include_top=True, weights='vggface',
             ...)

    if input_tensor is None:
        img_input = Input(shape=input_shape, dtype='uint8')

But for int only int32 is allowed. However, using int32 leads to the problem, that the following layers expect float32.

This seems not the right way to do it manually for all layers.

Why does my model during quantization not except the uint8 data and automatically change the input to uint8?

What did I miss? Do you know a solution? thanks a lot.

score 0 · Answer 1 · answered Sep 11 '20 at 13:32

SOLUTION from user dtlam26

Even though the model still does not run with the google NNAPI, the solution to quantize the model with in and output in int8 using either TF 1.15.3 or TF2.2.0 is, thanks to delan:

...
converter = tf.lite.TFLiteConverter.from_keras_model_file(saved_model_dir + modelname) 
        
def representative_dataset_gen():
  for _ in range(10):
    pfad='pathtoimage/000001.jpg'
    img=cv2.imread(pfad)
    img = np.expand_dims(img,0).astype(np.float32) 
    # Get sample input data as a numpy array in a method of your choosing.
    yield [img]
    
converter.representative_dataset = representative_dataset_gen

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.experimental_new_converter = True

converter.target_spec.supported_types = [tf.int8]
converter.inference_input_type = tf.int8 
converter.inference_output_type = tf.int8 
quantized_tflite_model = converter.convert()
if tf.__version__.startswith('1.'):
    open("test153.tflite", "wb").write(quantized_tflite_model)
if tf.__version__.startswith('2.'):
    with open("test220.tflite", 'wb') as f:
        f.write(quantized_tflite_model)

full-quatization does not except int8 data to change model input layer to int8

1 Answers1

Linked