How to weight observations in mxnet?

Question

I am new to neural networks and the mxnet package in R. I want to do a logistic regression on my predictors since my observations are probabilities varying between 0 and 1. I'd like to weight my observations by a vector obsWeights I have, but I'm not sure where to implement the weights. There seems to be a weight= option in mx.symbol.FullyConnected but if I try weight=obsWeights I get the following error message

Error in mx.varg.symbol.FullyConnected(list(...)) : 
  Cannot find argument 'weight', Possible Arguments:
----------------
num_hidden : int, required
  Number of hidden nodes of the output.
no_bias : boolean, optional, default=False
  Whether to disable bias parameter.

How should I proceed to weight my observations? Here is my code at the moment.

# Prepare data
train.mm = model.matrix(obs ~ . , data = train_data)
train_label = train_data$obs

# Normalize
train.mm = apply(train.mm, 2, function(x) (x-min(x))/(max(x)-min(x)))

# Create MXDataIter compatible iterator
batch_size = 128
train.iter = mx.io.arrayiter(data=t(train.mm), label=train_label, 
                               batch.size=batch_size, shuffle=T)

# Symbolic model definition
data = mx.symbol.Variable('data')
fc1 = mx.symbol.FullyConnected(data=data, num.hidden=128, name='fc1')
act1 = mx.symbol.Activation(data=fc1, act.type='relu', name='act1')
final = mx.symbol.FullyConnected(data=act1, num.hidden=1, name='final')
logistic = mx.symbol.LogisticRegressionOutput(data=final, name='logistic')

# Run model
mxnet_train = mx.model.FeedForward.create(
                symbol = logistic,
                X = train.iter,
                initializer = mx.init.Xavier(rnd_type = 'gaussian', factor_type = 'avg', magnitude = 2),
                num.round = 25)

score 3 · Answer 1 · answered Feb 19 '17 at 03:33

Assigning the fully connected weight argument is not what you want to do at any rate. That weight is a reference to parameters of the layer; i.e., what you multiply in the inputs by to get output values These are the parameter values you're trying to learn.

If you want to make some samples matter more than others, then you'll need to adjust the loss function. For example, multiply the usual loss function by your weights so that they do not contribute as much to the overall average loss.

I do not believe the standard Mxnet loss functions have a spot for assigning weights (that is LogisticRegressionOutput won't cover this). However, you can make your own cost function that does. This would involve passing your final layer through a sigmoid activation function to first generate the usual logistic regression output value. Then pass that into the loss function you define. You could do squared error, but for logistic regression you'll probably want to use the cross entropy function:

l * log(y) + (1 - l) * log(1 - y),

where l is the label and y is the predicted value.

Ideally, you'd write a symbol with an efficient definition of the gradient (Mxnet has a cross entropy function, but its for softmax input, not a binary output. You could translate your output to two outputs with softmax as an alternative, but that seems less easy to work with in this case), but the easiest path would be to let Mxnet do its autodiff on it. Then you multiply that cross entropy loss by the weights.

I haven't tested this code, but you'd ultimately have something like this (this is what you'd do in python, should be similar in R):

label = mx.sym.Variable('label')
out = mx.sym.Activation(data=final, act_type='sigmoid')
ce = label * mx.sym.log(out) + (1 - label) * mx.sym.log(1 - out)
weights = mx.sym.Variable('weights')
loss = mx.sym.MakeLoss(weigths * ce, normalization='batch')

Then you want to input your weight vector into the weights Variable along with your normal input data and labels.

As an added tip, the output of an mxnet network with a custom loss via MakeLoss outputs the loss, not the prediction. You'll probably want both in practice, in which case its useful to group the loss with a gradient-blocked version of the prediction so that you can get both. You'd do that like this:

pred_loss = mx.sym.Group([mx.sym.BlockGrad(out), loss])

That makes a lot of sense, I wasn't aware that you could define your own loss functions. Let me do some tests and then I'll provide my results so that you can edit your code accordingly and I can accept this answer. Thanks! — jgadoury, Feb 20 '17 at 15:38

How to weight observations in mxnet?

1 Answers1