NOTE:
I am new to MXNet.
It seems that the Gluon
module is meant to replace(?) the Symbol
module as the high level neural network (nn
) interface. So this question specifically seeks an answer utilizing the Gluon
module.
Context
Residual neural networks (res-NNs) are fairly popular architecture (the link provides a review of res-NNs). In brief, res-NNs is an architecture where the input undergoes a (series of) transformation(s) (e.g. through a standard nn layer) and at the end is combined with its unadulterated self prior to an activation function:
So the main question here is "How to implement a res-NN structure with a custom gluon.Block
?" What follows is:
- my attempt at doing this (which is incomplete and probably has errors)
- as subquestions highlighted as block questions.
Normally sub-questions are seen as concurrent main questions resulting in the post being flagged as too general. In this case, they are legit sub questions, as my inability to solve my main questions stems from these sub-questions and the partial / first-draft documentation of the gluon module is insufficient to answer them.
Main Question
"How to implement a res-NN structure with a custom gluon.Block
?"
First lets do some imports:
import mxnet as mx
import numpy as np
import math
import random
gpu_device=mx.gpu()
ctx = gpu_device
Prior to defining our res-NN structure, first we define a common convolution NN (cnn) architecture; namely, convolution → batch norm. → ramp.
class CNN1D(mx.gluon.Block):
def __init__(self, channels, kernel, stride=1, padding=0, **kwargs):
super(CNN1D, self).__init__(**kwargs)
with self.name_scope():
self.conv = mx.gluon.nn.Conv1D(channels=channels, kernel_size=kernel, strides=1, padding=padding)
self.bn = mx.gluon.nn.BatchNorm()
self.ramp = mx.gluon.nn.Activation(activation='relu')
def forward(self, x):
x = self.conv(x)
x = self.bn(x)
x = self.ramp(x)
return x
Subquestion: mx.gluon.nn.Activation vs
NDArray
module's nd.relu? When to use which and why. In all MXNet tutorials / demos I saw in their documentation, customgluon.Block
s usend.relu(x)
in theforward
function.Subquestion:
self.ramp(self.conv(x))
vsmx.gluon.nn.Conv1D(activation='relu')(x)
? i.e. what is the consequence of adding the activation argument to a layer? Does that mean the activation is automatically applied in theforward
function when that layer is called?
Now that we have a re-usable cnn chuck, let's define a res-NN where:
- there are
chain_length
number of cnn chucks - the first cnn chuck uses a different stride than all the subsequent
so here is my attempt:
class RES_CNN1D(mx.gluon.Block):
def __init__(self, channels, kernel, initial_stride, chain_length=1, stride=1, padding=0, **kwargs):
super(RES_CNN1D, self).__init__(**kwargs)
with self.name_scope():
num_rest = chain_length - 1
self.ramp = mx.gluon.nn.Activation(activation='relu')
self.init_cnn = CNN1D(channels, kernel, initial_stride, padding)
# I am guessing this is how to correctly add an arbitrary number of chucks
self.rest_cnn = mx.gluon.nn.Sequential()
for i in range(num_rest):
self.rest_cnn.add(CNN1D(channels, kernel, stride, padding))
def forward(self, x):
# make a copy of untouched input to send through chuncks
y = x.copy()
y = self.init_cnn(y)
# I am guess that if I call a mx.gluon.nn.Sequential object that all nets inside are called / the input gets passed along all of them?
y = self.rest_cnn(y)
y += x
y = self.ramp(y)
return y
Subquestion: adding a variable number of layers, should one use the hacky
eval("self.layer" + str(i) + " = mx.gluon.nn.Conv1D()")
or is this whatmx.gluon.nn.Sequential
is meant for?Subquestion: when defining the
forward
function in a customgluon.Block
which has an instance ofmx.gluon.nn.Sequential
(let us refer to it asself.seq
), doesself.seq(x)
just pass the argumentx
down the line? e.g. if this isself.seq
self.seq = mx.gluon.nn.Sequential()
self.conv1 = mx.gluon.nn.Conv1D()
self.conv2 = mx.gluon.nn.Conv1D()
self.seq.add(self.conv1)
self.seq.add(self.conv2)
is
self.seq(x)
equivalent toself.conv2(self.conv1(x))
?
Is this correct?
The desired result for
RES_CNN1D(10, 3, 2, chain_length=3)
should look like this
Conv1D(10, 3, stride=2) -----
BatchNorm |
Ramp |
Conv1D(10, 3) |
BatchNorm |
Ramp |
Conv1D(10, 3) |
BatchNorm |
Ramp |
| |
(+)<-------------------------
v
Ramp