How to convert a CNN from keras to mxnet?

Question

I have the following problem: I have a script in Keras which works like a charm. I would like to convert this script to MXNet now. The CNN in Keras looks like this:

model=Sequential()
model.add(Convolution2D(128, (3, 3), padding='same', activation='relu', name='block1_conv1', input_shape=(80,120,3)))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(Convolution2D(256, (3, 3), padding='same', activation='relu', name='block2_conv1'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(Flatten())
model.add(Dense(2, activation = 'softmax', name='final_fully_connected'))

I thought the conversion to MXNet couldn't be that difficult, I looked at the corresponding documentation and transferred the parameters to my best knowledge.

model=gluon.nn.Sequential()
with model.name_scope():
    model.add(gluon.nn.Conv2D(channels=128, kernel_size=(3, 3), activation='relu'))
    model.add(gluon.nn.MaxPool2D(pool_size=(2, 2), strides=(2, 2)))            
    model.add(gluon.nn.Conv2D(channels=256, kernel_size=(3, 3), activation='relu'))
    model.add(gluon.nn.MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
    # The Flatten layer collapses all axis, except the first one, into one axis.
    model.add(gluon.nn.Flatten())
    model.add(gluon.nn.Dense(2, activation='relu'))

But if I try to train the model now, I get the following error:

"MXNetError: [17:01:34] C:\ci\libmxnet_1533399150922\work\src\operator\nn\pooling.cc:145: Check failed: param.kernel[1] <= dshape[3] + 2 * param.pad[1] kernel size (2) exceeds input (1 padded to 1)"

I think it has something to do with the dimensions of the kernel and the MaxPooling2D layer, but I don't understand the error because I thought I was actually building the same network as in Keras.

For completeness: My input variable X has the dimensions (80, 120, 3).

I would really appreciate the help of some Keras/MXNet pros.

score 2 · Accepted Answer · answered Apr 09 '19 at 07:04

My function to define the model:

# DEFINE THE MODEL
def create_model(load_file=None):
    num_outputs = 2                   # The number of outputs of the network
    channels    = [128, 256]          # The number of different filters (each with other entries) in the convolution.
    kernel_size = (3, 3)              # Specifies the dimensions of the convolution window (i.e., filter).
    padding     = (kernel_size[0]//2, 
                   kernel_size[1]//2) # To be able to process the border regions of the input layer with the kernel (e.g., a kernel of 3x3 needs an additional neighboring cell), these are surrounded by zeros.
    pool_size   = (2, 2)              # Specifies the size of the pooling window (i.e. region) from which the maximum value is determined.
    strides     = (2, 2)              # Determines by how many steps the pooling window moves. A  pooling window of 2x2 and a step size of 2x2 means that the regions won't overlap.

    net = gluon.nn.Sequential(prefix='cnn_')
    with net.name_scope():
        net.add(gluon.nn.Conv2D(channels=channels[0], kernel_size=kernel_size, padding=padding, activation='relu'))
        net.add(gluon.nn.MaxPool2D(pool_size=pool_size, strides=strides))            
        net.add(gluon.nn.Conv2D(channels=channels[1], kernel_size=kernel_size, padding=padding, activation='relu'))
        net.add(gluon.nn.MaxPool2D(pool_size=pool_size, strides=strides))           
        # The Flatten layer collapses all axis, except the first one, into one axis.
        net.add(gluon.nn.Flatten())
        # In the keras template the authors used activation='softmax'. In Gluon this activation function does not exist. Therefore, we first break down the output to the desired number of outputs and apply the softmax function after the output of the network.
        net.add(gluon.nn.Dense(num_outputs))

    # Initialize the model parameters
    net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)
#    net.collect_params().initialize(mx.init.Uniform(scale=1.0), ctx=ctx)


    # Optional: Load model parameters from a previous run
    if load_file:
        net.load_parameters(load_file, ctx=ctx)

    return net

Afterwards, whenever I predict the classes, I use the softmax function of mxnet:

y_pred = nd.softmax(net(data[0]))

Foivos · Answer 2 · 2019-04-12T09:12:49.337

This is the exact translation (to the best of my knowledge) of your model, using the gluon mxnet api.

class YourNet(HybridBlock):
    def __init__(self,kernel_size = (3,3),dilation =(1,1),**kwargs):
        super(YourNet,self).__init__(**kwargs)

        # Use this scheme for padding='same' for **ODD** kernels
        px = dilation[0] * (kernel_size[0] - 1)//2
        py = dilation[1] * (kernel_size[1] - 1)//2

        pad = (px,py)

        # Here you DECLARE but not use!! the layers
        with self.name_scope():
            self.conv1 = gluon.nn.Conv2D(channels=128,kernel_size=kernel_size,padding=pad,dilation=dilation,prefix='_block1_conv1')
            self.conv2 = gluon.nn.Conv2D(channels=256,kernel_size=kernel_size,padding=pad,dilation=dilation,prefix='_block2_conv2')

            self.last_layer = gluon.nn.Dense(units=2,prefix='_final_fully_connected')

            # You need only one pooling operation, since it doesn't have trainable
            # parameters
            self.pool = gluon.nn.MaxPool2D(pool_size=(2,2),strides=(2,2))


    def hybrid_forward(self, F, input):
        """
        In this function you specify how you want to use the layers you defined
        previously. F stands for functional, it has some additional 
        function definitions. There are multiple ways to achieve the same result 
        (using layers instead of F.SomeFunction). 
        """


        out = self.conv1(input) # pass input through first layer
        out = F.relu(out) # do the activation of the output
        out = self.pool(out) # Do max pooling after the activation
        out = self.conv2(out) # Now pass through second convolution
        out = F.relu(out) # another activation
        out = self.pool(out) # Again maxpool 2D


        out = F.flatten(out) # Flatten the output. Similar with gluon.nn.Flatten()

        out = self.last_layer(out) # Apply last layer (dense)

        # Caution with the softmax on the channel applied
        out = F.softmax(out,axis=-1) # Do the softmax, with the last layer

        # Once you are done, return the output.
        return out

usage:

net = YourNet()
net.initialize()
net.hybridize() # ~ x3 speed performance (in gpus), using hybrid block. 

# Some random input
xx = nd.random.uniform(shape=[batch_size,3,80,120]) # Channels FIRST - performance improvement. 
out = net(xx)

# Try also net.summary(xx), without hybridizing first

Hello Foivos and thank you very much for your answer! I just want to finish the current training, but once this is through, I'll try your solution. It looks very promising. I'll get back to you as soon as the results are there. In any case, thank you very much for your efforts. — Stefan Renard, Apr 11 '19 at 07:20
Pleasure, try the mxnet forum for similar questions, you'll get more help there as it is dedicated to mxnet. — Foivos, Apr 12 '19 at 02:50
Hello Foivos, to be honest, I'm not quite sure if the net is doing the same thing. After the training I created a Class Activation Mapping (CAM) for both keras and mxnet to show me the relevant regions about Europe. Unfortunately I get different results. But maybe only my implementation is faulty? I will add my code below. — Stefan Renard, Apr 17 '19 at 08:37
Hi Stefan, it is possible the way you created the CAMs is different in the two networks, please post your code for both versions to spot any differences. I advise ask in the forum, as it is more flexible in terms of extended answers. — Foivos, Apr 18 '19 at 02:35
Hallo Foivos, ich bin deinem Rat gefolgt und habe eine Diskussion im mxnet-Forum gestartet: https://discuss.mxnet.io/t/how-to-convert-a-cnn-from-keras-to-mxnet/3811. — Stefan Renard, Apr 23 '19 at 06:34

score 1 · Answer 3 · answered Mar 20 '19 at 10:19

1

Okay, for those of you who might have a similar problem, here's the solution I figured out myself: The problem is that Keras and MXNet apply the convolutional layer to different dimensions. Keras takes the last dimension, while MXNet uses the first. A simple solution is to change the order of the dimensions so that the result is the same. In my case, an input parameter X with the dimensions (3, 80, 120) would give me the same result.

answered Mar 20 '19 at 10:19

Stefan Renard

73
6

This is correct, mxnet expects channels first. Also, in your mxnet solution you need to manually add padding=1, in the definition of the convolution kernels (to achieve padding='same'). – Foivos Apr 08 '19 at 04:12
@Foivos Thanks for your answer. As I found out, my mxnet network did not quite match the keras network. Also, to achieve padding='same' in mxnet, padding must be set to kernel_size//2 (see https://discuss.mxnet.io/t/does-pooling-convention-refer-to-type-of-padding-being-used/1864/8 or https://discuss.mxnet.io/t/pooling-and-convolution-with-same-mode/528). And since the Softmax activation function doesn't exist in mxnet, I also had to use it afterwards. – Stefan Renard Apr 09 '19 at 06:58

score 1 · Answer 4 · answered Apr 11 '19 at 21:44

To add to the previous posts, there is another path, you can try to simply use the MXNet backend in Keras. See the keras-mxnet package: https://github.com/awslabs/keras-apache-mxnet

pip install keras-mxnet

and modify your ~/.keras/keras.json to be:

{
    "floatx": "float32",
    "epsilon": 1e-07,
    "backend": "mxnet",
    "image_data_format": "channels_first"
}

How to convert a CNN from keras to mxnet?

4 Answers4