can't predict in mxnet 0.94 for R

Question

I have been able to use nnet and neuralnet to predict values in a conventional backprop network, but have been strugling to do the same with MXNET and R for many reasons.

This is the file (simple CSV with headers, columns have been normalized) https://files.fm/u/cfhf3zka

And this is the code I use:

filedata <- read.csv("example.csv")

require(mxnet)

datain <- filedata[,1:3]
dataout <- filedata[,4]

lcinm <- data.matrix(datain, rownames.force = "NA")
lcoutm <- data.matrix(dataout, rownames.force = "NA")
lcouta <- as.numeric(lcoutm)

data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=3)
act1 <- mx.symbol.Activation(fc1, name="sigm1", act_type="sigmoid")
fc2 <- mx.symbol.FullyConnected(act1, name="fc2", num_hidden=3)
act2 <- mx.symbol.Activation(fc2, name="sigm2", act_type="sigmoid")
fc3 <- mx.symbol.FullyConnected(act2, name="fc3", num_hidden=3)
act3 <- mx.symbol.Activation(fc3, name="sigm3", act_type="sigmoid")
fc4 <- mx.symbol.FullyConnected(act3, name="fc4", num_hidden=1)
softmax <- mx.symbol.LogisticRegressionOutput(fc4, name="softmax")

mx.set.seed(0)
mxn <- mx.model.FeedForward.create(array.layout = "rowmajor", softmax, X = lcinm, y = lcouta, learning.rate=0.01, eval.metric=mx.metric.rmse)

preds <- predict(mxn, lcinm)

predsa <-array(preds)

predsa

The console output is:

Start training with 1 devices
[1] Train-rmse=0.0852988247858687
[2] Train-rmse=0.068769514264606
[3] Train-rmse=0.0687647380075881
[4] Train-rmse=0.0687647164103567
[5] Train-rmse=0.0687647161066822
[6] Train-rmse=0.0687647160828069
[7] Train-rmse=0.0687647161241598
[8] Train-rmse=0.0687647160882147
[9] Train-rmse=0.0687647160594508
[10] Train-rmse=0.068764716079949
> preds <- predict(mxn, lcinm)
Warning message:
In mx.model.select.layout.predict(X, model) :
  Auto detect layout of input matrix, use rowmajor..

> predsa <-array(preds)
> predsa
   [1] 0.6776764 0.6776764 0.6776764 0.6776764 0.6776764 0.6776764 0.6776764 0.6776764 0.6776764
  [10] 0.6776764 0.6776764 0.6776764 0.6776764 0.6776764 0.6776764 0.6776764 0.6776764 0.6776764

So it gets an "average" but is not able to predict values, have tried other ways and learningrates to avoid overprediction but have never reached an even variable output.

score 2 · Accepted Answer · edited Apr 13 '17 at 12:44

I tried you're example and it seems like you're trying to predict continuous output with a LogisticRegressionOutput. I believe you should use the LinearRegressionOutput. You can see examples of this here and a Julia example here. Also, since you're predicting continuous output it might be better to use a different activation function such as ReLu, see some reasons for this at this question.

With these changes, I produced the following code:

data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=3)
act1 <- mx.symbol.Activation(fc1, name="sigm1", act_type="softrelu")
fc2 <- mx.symbol.FullyConnected(act1, name="fc2", num_hidden=3)
act2 <- mx.symbol.Activation(fc2, name="sigm2", act_type="softrelu")
fc3 <- mx.symbol.FullyConnected(act2, name="fc3", num_hidden=3)
act3 <- mx.symbol.Activation(fc3, name="sigm3", act_type="softrelu")
fc4 <- mx.symbol.FullyConnected(act3, name="fc4", num_hidden=1)
softmax <- mx.symbol.LinearRegressionOutput(fc4, name="softmax")

mx.set.seed(0)
mxn <- mx.model.FeedForward.create(array.layout = "rowmajor",
                                   softmax,
                                   X = lcinm,
                                   y = lcouta,
                                   learning.rate=1,
                                   eval.metric=mx.metric.rmse,
                                   num.round = 100)

preds <- predict(mxn, lcinm)

predsa <-array(preds)
require(ggplot2)
qplot(x = dataout, y = predsa, geom = "point", alpha = 0.6) +
  geom_abline(slope = 1)

This gets me a constantly diminishing error rate:

Start training with 1 devices
[1] Train-rmse=0.0725415842873665
[2] Train-rmse=0.0692660343340093
[3] Train-rmse=0.0692562284995407
...
[97] Train-rmse=0.048629236911287
[98] Train-rmse=0.0486272021266279
[99] Train-rmse=0.0486251858007309
[100] Train-rmse=0.0486231872849457

And the predicted outputs start to align with the actual outputs as demonstrated with this plot:

I have marked your solution as correct since you have shown the crossplot and that is enough for me, but your solution raises more questions as, why a LinearRegression and not a curved one as a Sigmoid (or why the Logistic is not working at all). And why on earth is this network a thousand times slower than nnet and neuralnet (even with a sigmoid as activation) all running in the same CPU — David, Mar 13 '17 at 16:59

can't predict in mxnet 0.94 for R

1 Answers1