Let's say we have a pretrained Gluon model for a classification task:
>>> import mxnet as mx
>>> net = mx.gluon.nn.HybridSequential()
>>> net.add(mx.gluon.nn.Conv2D(channels=6, kernel_size=5, padding=2, activation='sigmoid'))
>>> net.add(mx.gluon.nn.MaxPool2D(pool_size=2, strides=2))
>>> net.add(mx.gluon.nn.Flatten())
>>> net.add(mx.gluon.nn.Dense(units=10))
>>> net.collect_params()
hybridsequential0_ (
Parameter conv0_weight (shape=(6, 0, 5, 5), dtype=<class 'numpy.float32'>)
Parameter conv0_bias (shape=(6,), dtype=<class 'numpy.float32'>)
Parameter dense0_weight (shape=(1, 0), dtype=float32)
Parameter dense0_bias (shape=(1,), dtype=float32)
)
To fine-tune this convolutional network, we want to freeze all the blocks except Dense
.
First, recall that collect_params
method accepts a regexp string to choose specific block parameters by their names (or prefixes; prefix
parameter of Conv2D
, Dense
, or any other Gluon (hybrid) block). By default, the prefixes are class names, i.e. if a block is Conv2D
then the prefix is conv0_
or conv1_
etc. Moreover, collect_params
returns an instance of mxnet.gluon.parameter.ParameterDict
, which has setattr
method.
Solution:
>>> conv_params = net.collect_params('(?!dense).*')
>>> conv_params.setattr('grad_req', 'null')
or simply
>>> net.collect_params('(?!dense).*').setattr('grad_req', 'null')
Here we exclude all the parameters matching dense
to get only conv
blocks and set their grad_req
attributes to 'null'
. Now, training the model net
with mxnet.gluon.Trainer
will update only dense
parameters.
It is more convenient to have a pretrained model with separate attributes indicating specific blocks, e.g. the features block, anchor generators etc. In our case, we have a convolutional network that extracts features and passes them to an output block.
class ConvNet(mx.gluon.nn.HybridSequential):
def __init__(self, n_classes, params=None, prefix=None):
super().__init__(params=params, prefix=prefix)
self.features = mx.gluon.nn.HybridSequential()
self.features.add(mx.gluon.nn.Conv2D(channels=6, kernel_size=5, padding=2,
activation='sigmoid'))
self.add(mx.gluon.nn.MaxPool2D(pool_size=2, strides=2))
self.add(mx.gluon.nn.Flatten())
self.output = mx.gluon.nn.Dense(units=n_classes)
def hybrid_forward(self, F, x):
x = self.features(x)
return self.output(x)
With this convnet declaration, we don't have to use regexps to access required blocks:
>>> net = ConvNet(n_classes=10)
>>> net.features.collect_params().setattr('grad_req', 'null')
Gluon CV models follow exactly this pattern. See the documentation of the desired model and choose the blocks you would like to freeze. If the docs are empty, run
collect_params
to see all the parameters and filter out with regexp the ones to fine-tune and set the returned parameters' grad_req
to 'null'
.