3

I am working on a transfer learning approach and got very different results when using the MobileNetV2 from keras.applications and the one available on TensorFlow Hub. This seems strange to me as both versions claim here and here to extract their weights from the same checkpoint mobilenet_v2_1.0_224. This is how the differences can be reproduced, you can find the Colab Notebook here:

!pip install tensorflow-gpu==2.1.0
import tensorflow as tf
import numpy as np
import tensorflow_hub as hub
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2

def create_model_keras():
  image_input = tf.keras.Input(shape=(224, 224, 3))
  out = MobileNetV2(input_shape=(224, 224, 3),
                  include_top=True)(image_input)
  model = tf.keras.models.Model(inputs=image_input, outputs=out)
  model.compile(optimizer='adam', loss=["categorical_crossentropy"])
  return model

def create_model_tf():
  image_input = tf.keras.Input(shape=(224, 224 ,3))
  out = hub.KerasLayer("https://tfhub.dev/google/tf2-preview/mobilenet_v2/classification/4",
                      input_shape=(224, 224, 3))(image_input)
  model = tf.keras.models.Model(inputs=image_input, outputs=out)
  model.compile(optimizer='adam', loss=["categorical_crossentropy"])
  return model

When I try to predict on a random batch, the results are not equal:

keras_model = create_model_keras()
tf_model = create_model_tf()
np.random.seed(42)
data = np.random.rand(32,224,224,3)
out_keras = keras_model.predict_on_batch(data)
out_tf = tf_model.predict_on_batch(data)
np.array_equal(out_keras, out_tf)

The output of the version from keras.applications sums up to 1 but the version from TensorFlow Hub does not. Also the shape of the two versions is different: TensorFlow Hub has 1001 labels, keras.applications has 1000.

np.sum(out_keras[0]), np.sum(out_tf[0])

prints (1.0000001, -14.166359)

What is the reason for these differences? Am I missing something?

Edit 18.02.2020

As Szymon Maszke pointed out, the TFHub version returns logits. That's why i added a Softmax layer to the create_model_tf as follows: out = tf.keras.layers.Softmax()(x)

arnoegw mentioned that the TfHub version requires an image normalized to [0,1], whereas the keras version requires normalization to [-1,1]. When I use the following preprocessing on a test image:

from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
img = tf.keras.preprocessing.image.load_img("/content/panda.jpeg", target_size=(224,224))
img = tf.keras.preprocessing.image.img_to_array(img)
img = preprocess_input(img)
img = tf.io.read_file("/content/panda.jpeg")
img = tf.image.decode_jpeg(img)
img = tf.image.convert_image_dtype(img, tf.float32)
img = tf.image.resize(img, (224,224))

Both correctly predict the same label and the following condition is true: np.allclose(out_keras, out_tf[:,1:], rtol=0.8)

Edit 2 18.02.2020 Before I wrote that it is not possible to convert the formats into each other. This was caused by a bug.

  • `tensorflow`'s version probably returns `logits` (unnormalized probabilities). You can apply `cross entropy` loss on top of it to get probabilities. When this thing is done you can compare outputs from both (e.g. whether returned probabilities are reasonably close to each other). Summing will not tell you much in this case. – Szymon Maszke Feb 16 '20 at 20:02
  • Why would I need to apply a loss function on top? Do you maybe mean a softmax activation to normalize between 0 and 1 ? If I add this line to the `create_model_tf`, `out = tf.keras.layers.Softmax()(x)` I still get very different results but of course this time normalized to [0,1] – FrozenFennek Feb 17 '20 at 10:46
  • Oh gosh, sorry, I meant `softmax`, my bad. Return values looked like summed logits. Are you sure both models are pretrained and not randomly initialized? – Szymon Maszke Feb 17 '20 at 10:53
  • 1
    They are not randomy initialized. When I extract the weights like this: `keras_weights = keras_model.layers[1].get_weights()` `tf_weights = tf_model.layers[1].get_weights()` The ordering of layers is very different such that I cannot see the pattern. But it is possible to find layers that correspond like `np.array_equal(tf_weights[41], keras_weights[255])` `np.array_equal(tf_weights[53], keras_weights[205])` so I assume they use the same weights – FrozenFennek Feb 17 '20 at 11:17

1 Answers1

5

There are several documented differences:

  • Like Szymon said, the TF Hub version returns logits (before the softmax function that turns them into probabilities), which is a common practice, because the cross-entropy loss can be computed with greater numerical stability from the logits.

  • The TF Hub model assumes float32 inputs in the range of [0,1], which is what you get from tf.image.decode_jpeg(...) followed by tf.image.convert_image_dtype(..., tf.float32). The Keras code uses a model-specific range (likely [-1,+1]).

  • The TF Hub model reflects the original SLIM checkpoint more completely in returning all its 1001 output classes. As stated in the ImageNetLabels.txt linked from the documentation, the added class 0 is "background" (aka. "stuff"). That is what object detection uses to indicate image background as opposed to an object of any known class.

arnoegw
  • 1,218
  • 6
  • 13
  • Thanks for your answer. Still shouldn't it be possible to convert the two ranges to each other? I updated my question to show the problem. – FrozenFennek Feb 18 '20 at 18:17
  • Sorry my remarks were caused by a bug. Thanks a lot for your help. – FrozenFennek Feb 18 '20 at 18:36
  • If you load your ImageNet dataset using `ds=tf.keras.preprocessing.image_dataset_from_directory(...)` you can transform the resulting labels of `ds` from 1000 into 1001 by using `padding = tf.constant([[0,0], [1,0]]); ds = ds.map(lambda x,y: (x, tf.pad(y, padding, mode='CONSTANT')))` – Anna Christine Aug 24 '23 at 11:27