4

I'm trying to take weights from a very simple Caffe model and interpret it to fully functional Keras model.

This is the original definition of model in Caffe, let's call it simple.prototxt:

input: "im_data"
input_shape {
  dim: 1
  dim: 3
  dim: 1280
  dim: 1280
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "im_data"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    pad: 5
    stride: 4
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    pad: 0
    stride: 2
  }
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    kernel_size: 5
    pad: 2
    group: 2
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}

The layer definition in Caffe might look complex, but it just takes an image of dimensions 1280x1280x3 passes it to convolutional layer, then max pools it and passes it to the final convolutional layer.

Here is its implementation in Keras which is much more simple:

from keras.models import Model
from keras.layers import Input, BatchNormalization, 
from keras.activations import relu, softmax

im_data = Input(shape=(1280, 1280, 3),
                   dtype='float32',
                   name='im_data')
conv1 = Conv2D(filters=96,
               kernel_size=11,
               strides=(4, 4),
               activation=relu,
               padding='same',
               name='conv1')(im_data)

pooling1 = MaxPooling2D(pool_size=(3, 3),
                        strides=(2, 2),
                        padding='same',
                        name='pooling1')(conv1)
normalized1 = BatchNormalization()(pooling1)  # https://stats.stackexchange.com/questions/145768/importance-of-local-response-normalization-in-cnn

conv2 = Conv2D(filters=256,
               kernel_size=5,
               activation=relu,
               padding='same',
               name='conv2')(normalized1)
model = Model(inputs=[im_data], outputs=conv2)  

Problem:

Although both models seem to have similar parameters in each layers, but the problem is that their weight shapes are not equal. I am aware that Caffe has different shape order from Keras, but ordering is not the concern here.

Problem is that last convolution layer of Keras has different value in 3rd dimension compared to the last convolution layer in Caffe. See below.


Weight shapes for Caffe:

>>> net = caffe.net('simple.prototxt', 'premade_weights.caffemodel', caffe.TEST)
>>> for i in range(len(net.layers)):
...     if len(net.layers[i].blobs) != 0:  # if layer has no weights
...         print(("name", net._layer_names[i]))
...         print("weight_shapes", [v.data.shape for v in net.layers[i].blobs])
('name', 'conv1')
('weight_shapes', [(96, 3, 11, 11), (96,)])
('name', 'conv2')
('weight_shapes', [(256, 48, 5, 5), (256,)])

Weight shapes for Keras:

>>> for layer in model.layers:
...     if len(layer.get_weights()) != 0:
...         print(("name", layer.name))
...         print(("weight_shapes", [w.shape for w in layer.get_weights()]))  
('name', 'conv1')
('weight_shapes', [(11, 11, 3, 96), (96,)])
('name', 'conv2')
('weight_shapes', [(5, 5, 96, 256), (256,)])

This seems to be weird behavior. As you see, conv1 shape in Caffe and Keras are equal (ignoring the order). But in Caffe conv2 shape is [(256, 48, 5, 5), (256,)]), whereas in Keras 'conv2' shape is [(5, 5, 96, 256), (256,)], notice, that 48*2=96.

Also, notice that conv2 layer is directly after the max pooling layer, so there might be something wrong with the max pooling layer in Keras.


Question:

Did I correctly interpret model definition from Caffe to Keras? Especially the max pooling layer and its parameters?

Thank you very much!

Dmytro Prylipko
  • 4,762
  • 2
  • 25
  • 44
ShellRox
  • 2,532
  • 6
  • 42
  • 90

1 Answers1

6

Pay attention to the group: 2 field in your conv2 definition. That means you got a grouped convolution there (Caffe: What does the group param mean?). Technically that means that you have two filters, each of shape (128, 48, 5, 5). The first one will convolve with the first 48 channels and produce the first 128 outputs, the second one is for the remaining ones. However, Caffe stores the two weights in a single blob, that's why it's shape is (128x2, 48, 5, 5)

There is no such param in Keras Conv2D layer, but the widely adopted workaround is to split the input feature map with Lambda layers, process them with the two distinct convolutional layers and then merge back to a single feature map.

from keras.layers import Concatenate

normalized1_1 = Lambda(lambda x: x[:, :, :, :48])(normalized1)
normalized1_2 = Lambda(lambda x: x[:, :, :, 48:])(normalized1)

conv2_1 = Conv2D(filters=128,
                 kernel_size=5,
                 activation=relu,
                 padding='same',
                 name='conv2_1')(normalized1_1)

conv2_2 = Conv2D(filters=128,
                 kernel_size=5,
                 activation=relu,
                 padding='same',
                 name='conv2_2')(normalized1_2)

conv2 = Concatenate(name='conv_2_merge')([conv2_1, conv2_2])

I did not check the code for correctness, but the idea must be something like this.

Concerning your task: Converting networks from Caffe to Keras might be tricky. To get absolutely the same result, you must encounter a lot of subtle things like asymmetric padding in convolutions or different max-pooling behavior. That is why if you import the weights from Caffe, you probably cannot replace the LRN layer with batchnorm. Fortunately, there are implementations of LRN in Keras, for instance here.

Dmytro Prylipko
  • 4,762
  • 2
  • 25
  • 44
  • Yes I found that out later yesterday. From documentation of official convolution in Caffe, once I changed group parameter back to `1`, everything was fine. But I was looking for exact implementation, and I found it here. I was also concerned about padding and local response normalization, I used `ZeroPadding2D` before but asymmetric padding seems to be different. I noticed that Caffe's `padding` doesn't affect output shape, do you know why? Thank you very much for the great help! I appreciate it! – ShellRox Feb 10 '19 at 15:59
  • 1
    `pad` parameter in Caffe convolutional layer must affect the output shape, what do you mean? Asymmetric padding may happen in TensorFlow under some conditions (stride > 1, padding='same'), and to avoid it you can use combination of tf.pad + Conv2D with padding 'valid'. See conv2d_same function from https://programtalk.com/vs2/python/12827/models/slim/nets/resnet_utils.py for example. – Dmytro Prylipko Feb 10 '19 at 20:22
  • Is this another weird behavior? For example, let's take `pad: 5` parameter for the first layer `conv1`, and set it to `pad: 10`, the shape of the layer remains the same `(96, 3, 11, 11)`. Is this normal behavior? If not I guess this is a concern for another answer. – ShellRox Feb 10 '19 at 20:37
  • I apologize, I was a little perplexed and I read weight shapes instead of an actual output shape. Padding does affect the output like it should normally. I guess I will create my own layer with `tf.pad` and `"symmetric"` option. Thank you! – ShellRox Feb 10 '19 at 20:46