I have not got a clear idea about how labels for the softmax classifier should be shaped.
What I could understand from my experiments is that a scalar laber indicating the index of class probability output is one option, while another is a 2D label where the rows are class probabilities, or one-hot encoded variable, like c(1, 0, 0).
What puzzles me though is that:
- I can use sclalar label values that go beyong indexing, like 4 in my example below -- without warning or error. Why is that?
- When my label is a negative scalar or an array with a negative value,
the model converges to uniform probablity distribution over classes.
For example, is this expected that
actor_train.y = matrix(c(0, -1,v0), ncol = 1)
results in equal probabilities in the softmax output? I try to use softmax MXNET classifier to produce the policy gradient reifnrocement learning, and my negative rewards lead to the issue above: uniform probability. Is that expected?
require(mxnet)
actor_initializer <- mx.init.Xavier(rnd_type = "gaussian", factor_type = "avg", magnitude = 0.0001)
actor_nn_data <- mx.symbol.Variable('data') actor_nn_label <- mx.symbol.Variable('label')
device.cpu <- mx.cpu()
NN architecture
actor_fc3 <- mx.symbol.FullyConnected( data = actor_nn_data , num_hidden = 3 )
actor_output <- mx.symbol.SoftmaxOutput( data = actor_fc3 , label = actor_nn_label , name = 'actor' )
crossentfunc <- function(label, pred) { - sum(label * log(pred)) }
actor_loss <- mx.metric.custom( feval = crossentfunc , name = "log-loss" )
initialize NN
actor_train.x <- matrix(rnorm(11), nrow = 1)
actor_train.y = 0 #1 #2 #3 #-3 # matrix(c(0, 0, -1), ncol = 1)
rm(actor_model)
actor_model <- mx.model.FeedForward.create( symbol = actor_output, X = actor_train.x, y = actor_train.y, ctx = device.cpu, num.round = 100, array.batch.size = 1, optimizer = 'adam', eval.metric = actor_loss, clip_gradient = 1, wd = 0.01, initializer = actor_initializer, array.layout = "rowmajor" )
predict(actor_model, actor_train.x, array.layout = "rowmajor")