Is there any side effect by using ‘Sequential()’ or ‘HybridSequential()’ as a container only?

Question

I am reading a tutorial about MxNet. The writers use ‘mxnet.gluon.nn.Sequential()’ as a container to store some blocks (see code 1); then, they rewrite the connection of blocks in ‘def forward(self, x)’ (see codes 2 and 3). Is there any side effect by doing this? By the way, what is the difference between ‘Sequential()’ and ‘HybridSequential()’. I try a list to replace the ‘Sequential’, and I get following warnings doing the initialization process.

“ToySSD.downsamplers” is a container with Blocks. Note that Blocks inside the list, tuple or dict will not be registered automatically. Make sure to register them using register_child() or switching to nn.Sequential/nn.HybridSequential instead.’

As far as I know, if you put some blocks in ‘mxnet.gluon.nn.Sequential()’ or ‘mxnet.gluon.nn.HybridSequential()’, this action is telling the computer that these blocks are connected. However, if you design the relationship of blocks in the ‘forward’ function, you are telling the computer to connect these blocks in another way. Will it lead to confusion? If I only design some block connections in ‘forward’, what are the relationships of the other blocks in ‘Sequential()’ that are not designed in ‘forward’ function?

The entire tutorial can be found in here.

code 1：

def toy_ssd_model(num_anchors, num_classes):
    downsamplers = nn.Sequential()
    for _ in range(3):
        downsamplers.add(down_sample(128))

    class_predictors = nn.Sequential()
    box_predictors = nn.Sequential()    
    for _ in range(5):
        class_predictors.add(class_predictor(num_anchors, num_classes))
        box_predictors.add(box_predictor(num_anchors))

    model = nn.Sequential()
    model.add(body(), downsamplers, class_predictors, box_predictors)
    return model

code 2:

def toy_ssd_forward(x, model, sizes, ratios, verbose=False):    
    body, downsamplers, class_predictors, box_predictors = model
    anchors, class_preds, box_preds = [], [], []
    # feature extraction    
    x = body(x)
    for i in range(5):
        # predict
        anchors.append(MultiBoxPrior(
            x, sizes=sizes[i], ratios=ratios[i]))
        class_preds.append(
            flatten_prediction(class_predictors[i](x)))
        box_preds.append(
            flatten_prediction(box_predictors[i](x)))
        if verbose:
            print('Predict scale', i, x.shape, 'with', 
                  anchors[-1].shape[1], 'anchors')
        # down sample
        if i < 3:
            x = downsamplers[i](x)
        elif i == 3:
            x = nd.Pooling(
                x, global_pool=True, pool_type='max', 
                kernel=(x.shape[2], x.shape[3]))
    # concat data
    return (concat_predictions(anchors),
            concat_predictions(class_preds),
            concat_predictions(box_preds))

code 3:

from mxnet import gluon
class ToySSD(gluon.Block):
    def __init__(self, num_classes, verbose=False, **kwargs):
        super(ToySSD, self).__init__(**kwargs)
        # anchor box sizes and ratios for 5 feature scales
        self.sizes = [[.2,.272], [.37,.447], [.54,.619], 
                      [.71,.79], [.88,.961]]
        self.ratios = [[1,2,.5]]*5
        self.num_classes = num_classes
        self.verbose = verbose
        num_anchors = len(self.sizes[0]) + len(self.ratios[0]) - 1
        # use name_scope to guard the names
        with self.name_scope():
            self.model = toy_ssd_model(num_anchors, num_classes)

    def forward(self, x):
        anchors, class_preds, box_preds = toy_ssd_forward(
            x, self.model, self.sizes, self.ratios, 
            verbose=self.verbose)
        # it is better to have class predictions reshaped for softmax computation       
        class_preds = class_preds.reshape(shape=(0, -1, self.num_classes+1))
        return anchors, class_preds, box_preds

Indhu Bharathi · Accepted Answer · 2018-05-22T23:21:57.263

In Gluon, networks are build using Blocks. If something is not a Block, it cannot be part of a Gluon network. Dense layer is a Block, Convolution is a Block, Pooling layer is a Block, etc.

Sometimes you might want a Block that is not a pre-defined block in Gluon but is a sequence of predefined Gluon blocks. For example,

Conv2D -> MaxPool2D -> Conv2D -> MaxPool2D -> Flatten -> Dense -> Dense

Gluon doesn't have a pre-defined block that does the above sequence of operation. But Gluon does have Blocks that does each of the individual operation. So, you can create your own block that does the above sequence of operation by stringing together predefined Gluon blocks. Example:

net = gluon.nn.HybridSequential()

with net.name_scope():

    # First convolution
    net.add(gluon.nn.Conv2D(channels=20, kernel_size=5, activation='relu'))
    net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))

    # Second convolution
    net.add(gluon.nn.Conv2D(channels=50, kernel_size=5, activation='relu'))
    net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))

    # Flatten the output before the fully connected layers
    net.add(gluon.nn.Flatten())

    # First fully connected layers with 512 neurons
    net.add(gluon.nn.Dense(512, activation="relu"))

    # Second fully connected layer with as many neurons as the number of classes
    net.add(gluon.nn.Dense(num_outputs))

When you create a sequence like that, you can either use HybridSequential or Sequential. To understand the difference, you need to understand the difference between symbolic and imperative programming.

HybridBlock is a Block that can be converted into symbolic graph for faster execution. HybridSequential is a sequence of Hybrid blocks.
Blocks (not the hybrid ones) is a Block that cannot be converted into symbolic graph. Sequential is a sequence of non hybrid Blocks.

Whether or not a block is Hybrid depends on how it is implemented. Almost all predefined Gluon blocks are also HybridBlocks. Sometimes there is reason why some blocks cannot be Hybrid. Tree LSTM is one example. More often, something is not Hybrid just because whoever wrote it didn't put in the effort to make it Hybrid for several reasons (ex: maybe making it hybrid won't give big performance boost or maybe it is hard to make the block hybrid).

Note that Sequential and HybridSequential are not just containers like Python list. When you use one of them, you are actually creating a new Block using preexisting blocks. This is why you cannot replace Sequential using Python list.

Okay, so you know how to create your own block by stringing together preexisting blocks. Good. What if you want to not just pass the data through a sequence of blocks? What if you want to conditionally pass the data through one of those blocks. Here is an example from ResNet:

class BasicBlockV1(HybridBlock):
    def __init__(self, channels, stride, downsample=False, in_channels=0, **kwargs):
        super(BasicBlockV1, self).__init__(**kwargs)
        self.body = nn.HybridSequential(prefix='')
        self.body.add(_conv3x3(channels, stride, in_channels))
        self.body.add(nn.BatchNorm())
        self.body.add(nn.Activation('relu'))
        self.body.add(_conv3x3(channels, 1, channels))
        self.body.add(nn.BatchNorm())
        if downsample:
            self.downsample = nn.HybridSequential(prefix='')
            self.downsample.add(nn.Conv2D(channels, kernel_size=1, strides=stride,
                                          use_bias=False, in_channels=in_channels))
            self.downsample.add(nn.BatchNorm())
        else:
            self.downsample = None

    def hybrid_forward(self, F, x):
        residual = x

        x = self.body(x)

        if self.downsample:
            residual = self.downsample(residual)

        x = F.Activation(residual+x, act_type='relu')

        return x

This code creates a new Block using preexisting Gluon blocks. But it does more than just running the data through some preexisting blocks. Given some data, the block runs the data through the body block aways. But then, runs the data through downsample only if this Block was created with downsample set to true. It then concats the output of body and downsample to create the output. Like you can see there is more happening than just passing data through a sequence of Blocks. This is when you create your own block by subclassing HybridBlock or Block.

Note that the __init__ function created the necessary blocks and forward function gets the inputs and runs the input through the blocks created in __init__. forward does not modify the blocks created in __init__. It only runs the data through the blocks created in __init__.

In the example you quoted, the first code block creates blocks like downsamplers, class_predictors, box_predictors. The forward functions in code block 2 and 3 do not modify those blocks. They merely pass the input data through those blocks.

Thanks for the introduction of the symbolic and imperative programming. I infer that the ‘mxnet’ is trying to combine the advantages from the symbolic programming and the imperative programming. Is it the derivation of name ‘mxnet’? It is exciting that I can draw a specific layer out by using indexing when it is built with ‘Sequential’. A tutorial in ‘mxnet’ home page shows a similar example (https://gluon-crash-course.mxnet.io/nn.html). But, a layer with designed forward function cannot support indexing. net[0]->>'ToySSD' object does not support indexing Do I misunderstand something? — Blue Bird, May 23 '18 at 03:27
And, can I retrieve the intermediate output of a specific layer such as the weights by using indexing? — Blue Bird, May 23 '18 at 03:28
Another reason is that mxNet magically processes the derivative. In MATLAB, Maple or Mathematica, a symbol equation is designed when we have written it down. However, a symbol equation is designing while one is performing computing in mxNet. When you put something in ‘mxnet.gluon.nn.Sequential()’ or ‘mxnet.gluon.nn.HybridSequential()’, mxNet do not know how to calculate the derivative until you show it how to perform a forward computing. The documentation contains an example at ‘https://gluon.mxnet.io/chapter01_crashcourse/autograd.html’. — Blue Bird, Jun 03 '18 at 08:21

Is there any side effect by using ‘Sequential()’ or ‘HybridSequential()’ as a container only?

1 Answers1