Tensorflow - Preprocessing image in model prediction

Question

I have trained a model using the Functional API and two different kind of pre-trained model: EfficientNet B5 and MobileNet V2. After tranining with the saved model, I'm running an application which uses that model to make some predictions.

I'm fronting a doubt relatated to what is the correct way to pass the images to "model.prediction()" arguments.

Model:

    self.feature_extractor1 = EfficientNetB5(#weights='imagenet',
                                  input_shape=self.input_shape,
                                  include_top=False)

    self.feature_extractor2 = MobileNetV2(#weights='imagenet',
                                  input_shape=self.input_shape,
                                  include_top=False)


    for layer in self.feature_extractor1.layers:
        layer.trainable = False    

    for layer in self.feature_extractor2.layers:
        layer.trainable = False        
    

    input_ = Input(shape=self.input_shape)
    processed_input1 = b5_preprocess_input(input_)

    processed_input2 = mbv2_preprocess_input(input_)

    x1 = self.feature_extractor1(processed_input1)
    x1 = GlobalAveragePooling2D()(x1)
    x1 = Dropout(0.2)(x1)
    x1 = Flatten()(x1)

    x2 = self.feature_extractor2(processed_input2)
    x2 = GlobalAveragePooling2D()(x2)
    x2 = Dropout(0.2)(x2)
    x2 = Flatten()(x2)

    x = Concatenate()([x1, x2])

    x = Dense(512, activation='relu')(x) #,kernel_initializer=initializer,kernel_regularizer=regularizers.l2(0.001)) 
    x = Dense(1024, activation='relu')(x)

    output_shape = Dense(shape_categories, activation='softmax', name='shape')(x)

    model = Model(inputs=input_,
                  outputs=output_shape)
                  
    adam_kwargs = {'beta_1': 0.9, 'beta_2': 0.9, 'epsilon': 1e-7}
    sgd_kwargs = {'decay': 1e-6, 'momentum': 0.9, 'nesterov': True}
    optimizer = self.optimizers(kwargs=adam_kwargs)
    
    model.compile(loss='categorical_crossentropy',
                  optimizer=optimizer,
                  metrics=['accuracy'])

    model.summary()

    STEP_SIZE_TRAIN = self.phase_gen[0].n// self.phase_gen[0].batch_size
    STEP_SIZE_VALID = self.phase_gen[1].n// self.phase_gen[1].batch_size
    if self.phases == 3:
        STEP_SIZE_TEST = self.phase_gen[2].n// self.phase_gen[2].batch_size

    checkpoint = ModelCheckpoint(self.model_dir,
                                monitor='val_accuracy',
                                verbose=1,
                                save_best_only=True,
                                mode='max')
    tensorboard = TensorBoard(log_dir=self.model_dir + '/logs',
                            histogram_freq=5,
                            embeddings_freq=5)
                            #[EarlyStopping(monitor='val_loss', patience=8)]
    callbacks = [checkpoint, tensorboard]

    
    hist = model.fit_generator(generator=self.phase_gen[0],
                               steps_per_epoch=STEP_SIZE_TRAIN,
                               validation_data=self.phase_gen[1],
                               validation_steps=STEP_SIZE_VALID,
                               epochs=self.epochs,
                               callbacks=callbacks
                               )

In another script, I have the prediction method:

from tensorflow.keras.applications.mobilenet_v2 import preprocess_input as mbv2_preprocess_input
from tensorflow.keras.applications.efficientnet import preprocess_input as b5_preprocess_input

def preprocess_image(img):
    img = Image.open(io.BytesIO(img))
    img = img.resize((224, 224), Image.ANTIALIAS)
    img = image.img_to_array(img)
    img = np.expand_dims(img, axis=0)
    #return [b5_preprocess_input(img),  mbv2_preprocess_input(img)]
    return [img, img]

modelSHP = get_modelSHP()

@app.route('/part_numbers', methods=['POST'])
def part_number():
    img = request.files.get('image').read()
    processed_image = preprocess_image(img)
    predict_shape = modelSHP.predict(processed_image)

My first thought was that I would need to pass the input (image) pre processed by the correct function and in the same order I have used it during the model training. But when I have done it, my prediction accuracy stays around zero. Passing just the image, withouth any preprocessing, the results got better.

The way which I'm passing the image input to model.prediction is right (without preprocessing)? I was wondering if using the Functional API and in the way I built the model, the pre processing became such as a layer into each branch model.

score 0 · Accepted Answer · answered Aug 03 '21 at 14:25

I copied your code and then printed out the model summary as shown below

Model: "functional_5"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_23 (InputLayer)           [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
tf.math.truediv_5 (TFOpLambda)  (None, 224, 224, 3)  0           input_23[0][0]                   
__________________________________________________________________________________________________
tf.math.subtract_5 (TFOpLambda) (None, 224, 224, 3)  0           tf.math.truediv_5[0][0]          
__________________________________________________________________________________________________
efficientnetb5 (Functional)     (None, 7, 7, 2048)   28513527    input_23[0][0]                   
__________________________________________________________________________________________________
mobilenetv2_1.00_224 (Functiona (None, 7, 7, 1280)   2257984     tf.math.subtract_5[0][0]         
__________________________________________________________________________________________________
global_average_pooling2d_8 (Glo (None, 2048)         0           efficientnetb5[0][0]             
__________________________________________________________________________________________________
global_average_pooling2d_9 (Glo (None, 1280)         0           mobilenetv2_1.00_224[0][0]       
__________________________________________________________________________________________________
dropout_8 (Dropout)             (None, 2048)         0           global_average_pooling2d_8[0][0] 
__________________________________________________________________________________________________
dropout_9 (Dropout)             (None, 1280)         0           global_average_pooling2d_9[0][0] 
__________________________________________________________________________________________________
flatten_8 (Flatten)             (None, 2048)         0           dropout_8[0][0]                  
__________________________________________________________________________________________________
flatten_9 (Flatten)             (None, 1280)         0           dropout_9[0][0]                  
__________________________________________________________________________________________________
concatenate_3 (Concatenate)     (None, 3328)         0           flatten_8[0][0]                  
                                                                 flatten_9[0][0]                  
__________________________________________________________________________________________________
dense_6 (Dense)                 (None, 512)          1704448     concatenate_3[0][0]              
__________________________________________________________________________________________________
dense_7 (Dense)                 (None, 1024)         525312      dense_6[0][0]                    
__________________________________________________________________________________________________
shape (Dense)                   (None, 2)            2050        dense_7[0][0]                    
==================================================================================================
Total params: 33,003,321
Trainable params: 2,231,810
Non-trainable params: 30,771,511

As you postulated the preprocessing becomes layers in the model. So for predictions you do not have to preprocess the input as that is built into the model. For efficientNet the preprocessing function is simply a pass through as efficientnet expects input pixels in the range 0 to 255. So in the model summary you can see that the input (input_23) feeds directly into efficientnet. For MobileNet the preprocessing function scales the pixels between -1 and +1. That is done by the equation input pixels=pixel/127.5 - 1. So layer tf.math.truediv_5 divides the input_23 by 127.5 and then layer tf.math.subtract_5 subtracts 1.

Thanks! I was not checking the summary (sadly this idea didn't come in my mind) — wNakano, Aug 04 '21 at 02:16

Tensorflow - Preprocessing image in model prediction

1 Answers1