1

I want to implement caffeNet on keras with pre-trained on imagenet. So i got weight from caffe github there https://github.com/BVLC/caffe/tree/master/models/bvlc_reference_caffenet

I convert it to weight.h5 with caffe_weight_converter.Weight that i got on layer “conv2” have shape (256,48,5,5) but my implement model need (256,96,5,5).

I saw from Got confused after I extracted weights from Trained caffenet That because in layer "conv2" have split in 2 group. I want to ask that keras can split conv layer in to group ? or have any solution that can i get pretrained caffeNet on keras?

2 Answers2

0

I've tried to implement the lower part of CaffeNet(LRN layer omitted):

A = Input((277,277,3))
B = Convolution2D(filters=96, kernel_size=(11,11), strides=(4,4), activation='relu')(A)
C = MaxPooling2D(pool_size=(3,3), strides=(2,2))(B)
D1 = Lambda(lambda x: x[:,:,:,:48])(C)
D2 = Lambda(lambda x: x[:,:,:,48:])(C)
E = Concatenate()([D1,D2])
F = Convolution2D(filters=256, kernel_size=(5,5), padding="same")(E)
model = Model(A,F)

Ref: Caffe Convolution "Group" parameter conversion to Keras Conv2D

Splitting the output of a layer over the channels

keineahnung2345
  • 2,635
  • 4
  • 13
  • 28
0

@keineahnung2345 I cant post code in comment it's to long so I post in new answer.

model_input= Input((227,227,3))
#conv1
x=Conv2D(filters=96, kernel_size=(11,11), strides=(4,4), name="conv1",activation="relu")(model_input)
x=MaxPooling2D(pool_size=(3,3), strides=(2,2), name="pool1")(x)
x=BatchNormalization()(x)

#conv2
x=ZeroPadding2D((2, 2))(x)
con2_split1 = Lambda(lambda z: z[:,:,:,:48])(x)
con2_split2 = Lambda(lambda z: z[:,:,:,48:])(x)
a=x=Concatenate(axis=0)([con2_split1, con2_split2])
x=Conv2D(filters=256, kernel_size=(5,5), strides=(1,1), name="conv2",activation="relu")(x)
x=MaxPooling2D(pool_size=(3,3), strides=(2,2), name="pool2")(x)
x=BatchNormalization()(x)

#conv3
x= ZeroPadding2D((1, 1))(x)
x=Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), name="conv3",activation="relu")(x)

#conv4
x= ZeroPadding2D((1, 1))(x)
con4_split1 = Lambda(lambda z: z[:,:,:,:192])(x)
con4_split2 = Lambda(lambda z: z[:,:,:,192:])(x)
x=Concatenate(axis=0)([con4_split1, con4_split2])
x=Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), name="conv4",activation="relu")(x)

#con5
x= ZeroPadding2D((1, 1))(x)
con5_split1 = Lambda(lambda z: z[:,:,:,:192])(x)
con5_split2 = Lambda(lambda z: z[:,:,:,192:])(x)
x=Concatenate(axis=0)([con5_split1, con5_split2])
x=Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), name="conv5",activation="relu")(x)
#pool5
x=MaxPooling2D(pool_size=(3,3), strides=(2,2), name="pool5")(x)
x=Flatten()(x)

#fc6
x=Dense(4096,activation='relu',name="fc6")(x)
#dropout6
x=Dropout(0.5,name="droupout6")(x)
#fc7
x=Dense(4096,activation='relu',name="fc7")(x)
#dropout7
x=Dropout(0.5,name="droupout7")(x)
#fc8
x=Dense(1000,activation='softmax',name="fc8")(x)
model=Model(inputs=model_input, outputs=x)
model.summary()
model.load_weights("caffeNet_kerasWeight.h5",by_name=True)
  • I can load weight that i convert from caffe to keras in to model. But when i use it to predict it have error like this.--> "ValueError: could not broadcast input array from shape (8,1000) into shape (1,1000)" – Nitiwat Sompawong Jan 16 '19 at 09:44
  • Do your input size match your batch size? – keineahnung2345 Jan 16 '19 at 15:20
  • I predict with single image. I think my implement are wrong at `x=Concatenate(axis=0)([split1, split2])` because use axis=0. So next layer will see input shape as `(2,featuremap_width,featuremap_size,featuremap_channel)` that mean layer will see that input have batch=2. So i think only way to implement caffenet with pretrain imagenet weight in keras need to split Conv2D to each Lambda layer and split pretrain weight to each splited Conv2D and then concat 2 output after thaht like option 1 in this REF: https://groups.google.com/forum/#!topic/keras-users/bxPA4_Bda14. Do you have any comment? – Nitiwat Sompawong Jan 16 '19 at 18:42
  • `split Conv2D to each Lambda layer and split pretrain weight to each splited Conv2D and then concat 2 output after thaht like option 1`. Maybe you are right, according to Figure 2 of http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf, the two 48-layer feature maps go through separate convolutional layers and then are concatenated. – keineahnung2345 Jan 17 '19 at 16:34