I am using BucketingModule for training multiple small models/bots together. Here, the bucket key is bot_id
. However, each bot has separate set of target labels/classes (and hence, different size of softmax layer for each bot).
Is there any way to train such a model in mxnet, where I want to share the weights for all the layers but one (softmax) among all the bots?
How would I initialize such a model using sym_gen
method?
If in the sym_gen
method, for the Softmax layer I specify the num_hidden=size_dict[bot]
i.e.,
pred = mx.sym.FullyConnected(data=pred, num_hidden=len(size_dict[bot]), name='pred')
pred = mx.sym.SoftmaxOutput(data=pred, label=label, name='softmax')
I get the error:
Inferred shape does not match shared_exec.arg_array's shape
which makes sense as each bot has different number of target classes.