4

Is preprocessing like input normalization made by default by Tensorflow Object Detection API ?

I cannot find anywhere any documentation on it. There is an option called 'NormalizeImage' in the DataAugmentations. In all the configuration files for the models in the zoo I never see it used. I trained ssd_mobilenet_v3_small_coco_2020_01_14 for transfer learning to my custom class without using it and everything works. I know there is a similar question here but there is no answer in a couple of years and the network is different.

Testing with the following code (OpenCV 4.3.0 DNN module) produce the correct result:

import cv2 as cv


net = cv.dnn_DetectionModel('model/graph/frozen_inference_graph.pb', 'cvgraph.pbtxt')
net.setInputSize(300, 300)
#net.setInputScale(1.0 / 127.5)
#net.setInputMean((127.5, 127.5, 127.5))
net.setInputSwapRB(True)

frame = cv.imread('test/2_329_985_165605-561561.jpg')

classes, confidences, boxes = net.detect(frame, confThreshold=0.7)

for classId, confidence, box in zip(classes.flatten(), confidences.flatten(), boxes):
    print(classId, confidence)
    cv.rectangle(frame, box, color=(0, 255, 0))

cv.imshow('out', frame)
cv.waitKey()

While here normalization is used. Using normalization in my case produce a wrong result, bounding box is much bigger than it should be. I guess that input normalization is somewhere performed under the hood by tensorflow?

rok
  • 2,574
  • 3
  • 23
  • 44

1 Answers1

0

even if I am probably too late to help you I want to answer the question as I came across it when I had a pretty similar problem of understanding how the normalization is defined. Maybe it helps someone else.

I even posted my own question (here) but found the answer an hour later. As I can't find the model you used (Model zoo of tf1 leads to a dead link for ssd_mobilenet_v3_small_coco) I assume that the pipeline there looks similar to the one I used.

In the pipeline config a feature extractor is defined.

feature_extractor {
  type: "ssd_mobilenet_v2_keras"
  depth_multiplier: 1.0
  ...
  }

This uses this feature extractor. In this extractor the following preprocessing function is defined:

def preprocess(self, resized_inputs):
  """SSD preprocessing.
  Maps pixel values to the range [-1, 1].
  Args:
    resized_inputs: a [batch, height, width, channels] float tensor
      representing a batch of images.
  Returns:
    preprocessed_inputs: a [batch, height, width, channels] float tensor
    representing a batch of images.
  """
  return (2.0 / 255.0) * resized_inputs - 1.0

If you do the math you'll see that this is exactly the same as

image = (image-127.5)/127.5

just formatted in a different way. I hope this helps someone!

EDIT: However I just realized that this does not explain why the model of OP works better without preprocessing. I guess OPs preprocessing must be already defined in the cvgraph as it states in the opencv docs.

Burschken
  • 77
  • 11