I'm trying to take weights from a very simple Caffe model and interpret it to fully functional Keras model.
This is the original definition of model in Caffe, let's call it simple.prototxt
:
input: "im_data"
input_shape {
dim: 1
dim: 3
dim: 1280
dim: 1280
}
layer {
name: "conv1"
type: "Convolution"
bottom: "im_data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 11
pad: 5
stride: 4
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
pad: 0
stride: 2
}
}
layer {
name: "norm1"
type: "LRN"
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "norm1"
top: "conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
kernel_size: 5
pad: 2
group: 2
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
The layer definition in Caffe might look complex, but it just takes an image of dimensions 1280x1280x3
passes it to convolutional layer, then max pools it and passes it to the final convolutional layer.
Here is its implementation in Keras which is much more simple:
from keras.models import Model
from keras.layers import Input, BatchNormalization,
from keras.activations import relu, softmax
im_data = Input(shape=(1280, 1280, 3),
dtype='float32',
name='im_data')
conv1 = Conv2D(filters=96,
kernel_size=11,
strides=(4, 4),
activation=relu,
padding='same',
name='conv1')(im_data)
pooling1 = MaxPooling2D(pool_size=(3, 3),
strides=(2, 2),
padding='same',
name='pooling1')(conv1)
normalized1 = BatchNormalization()(pooling1) # https://stats.stackexchange.com/questions/145768/importance-of-local-response-normalization-in-cnn
conv2 = Conv2D(filters=256,
kernel_size=5,
activation=relu,
padding='same',
name='conv2')(normalized1)
model = Model(inputs=[im_data], outputs=conv2)
Problem:
Although both models seem to have similar parameters in each layers, but the problem is that their weight shapes are not equal. I am aware that Caffe has different shape order from Keras, but ordering is not the concern here.
Problem is that last convolution layer of Keras has different value in 3rd dimension compared to the last convolution layer in Caffe. See below.
Weight shapes for Caffe:
>>> net = caffe.net('simple.prototxt', 'premade_weights.caffemodel', caffe.TEST)
>>> for i in range(len(net.layers)):
... if len(net.layers[i].blobs) != 0: # if layer has no weights
... print(("name", net._layer_names[i]))
... print("weight_shapes", [v.data.shape for v in net.layers[i].blobs])
('name', 'conv1')
('weight_shapes', [(96, 3, 11, 11), (96,)])
('name', 'conv2')
('weight_shapes', [(256, 48, 5, 5), (256,)])
Weight shapes for Keras:
>>> for layer in model.layers:
... if len(layer.get_weights()) != 0:
... print(("name", layer.name))
... print(("weight_shapes", [w.shape for w in layer.get_weights()]))
('name', 'conv1')
('weight_shapes', [(11, 11, 3, 96), (96,)])
('name', 'conv2')
('weight_shapes', [(5, 5, 96, 256), (256,)])
This seems to be weird behavior. As you see, conv1
shape in Caffe and Keras are equal (ignoring the order). But in Caffe conv2
shape is [(256, 48, 5, 5), (256,)])
, whereas in Keras 'conv2' shape is [(5, 5, 96, 256), (256,)]
, notice, that 48*2=96
.
Also, notice that conv2
layer is directly after the max pooling layer, so there might be something wrong with the max pooling layer in Keras.
Question:
Did I correctly interpret model definition from Caffe to Keras? Especially the max pooling layer and its parameters?
Thank you very much!