CNN Image Recognition in MXNET in R to output Scalar number instead of Class

Question

I am trying to use the mxnet package in R using a CNN to try and predict a scalar output (in my case wait time) based on the images.

However, when I do this, I get the same resultant output (it predicts the same number which is probably just the average of all of the results). How do I get it to predict the scalar output correctly.

My image has already been pre-processed by greyscaling it and converting into the pixel format below and scaling it to 28 x 28 (I have also tried different sizes with no effect).

I am essentially using images to predict wait times which is why my train_y is the current wait times in seconds. When using this approach, while leaving my train_y as the current wait time in seconds, the algorithm just predicts the same number.

However, when I transform the train_y to a [0,1] by guessing the maximum value (20000), the CNN does output different numbers, but when scaling those numbers again by multiplying by 20000, I seem to get predictions with negative numbers, and numbers that are just way too skewed giving poor results to the model. Negative numbers especially don't make sense since all my train_y are positive and since I am dealing with time, there is no such thing as negative numbers

I have also played around with the learning rate by testing it from 0.05, 0.01, 0.001, 0.0001, 0.00001, and so forth, until 2e-8 with no effect on the model. I have also played around with the initializer

I have also played around with momentum by changing it from 0.9, to 0.95 with no effect on the model.

Here is my reproducible code:

set.seed(0)

df <- data.frame(replicate(784,runif(7538)))
df$waittime <- 1000*runif(7538)


training_index <- createDataPartition(df$waittime, p = .9, times = 1)
training_index <- unlist(training_index)

train_set <- df[training_index,]
dim(train_set)
test_set <- df[-training_index,]
dim(test_set)


## Fix train and test datasets
train_data <- data.matrix(train_set)
train_x <- t(train_data[, -785])
train_y <- train_data[,785]
train_array <- train_x
dim(train_array) <- c(28, 28, 1, ncol(train_array))


test_data <- data.matrix(test_set)
test_x <- t(test_set[,-785])
test_y <- test_set[,785]
test_array <- test_x
dim(test_array) <- c(28, 28, 1, ncol(test_x))




library(mxnet)
## Model
mx_data <- mx.symbol.Variable('data')
## 1st convolutional layer 5x5 kernel and 20 filters.
conv_1 <- mx.symbol.Convolution(data = mx_data, kernel = c(5, 5), num_filter = 20)
tanh_1 <- mx.symbol.Activation(data = conv_1, act_type = "tanh")
pool_1 <- mx.symbol.Pooling(data = tanh_1, pool_type = "max", kernel = c(2, 2), stride = c(2,2 ))
## 2nd convolutional layer 5x5 kernel and 50 filters.
conv_2 <- mx.symbol.Convolution(data = pool_1, kernel = c(5,5), num_filter = 50)
tanh_2 <- mx.symbol.Activation(data = conv_2, act_type = "tanh")
pool_2 <- mx.symbol.Pooling(data = tanh_2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
## 1st fully connected layer
flat <- mx.symbol.Flatten(data = pool_2)
fcl_1 <- mx.symbol.FullyConnected(data = flat, num_hidden = 500)
tanh_3 <- mx.symbol.Activation(data = fcl_1, act_type = "tanh")
## 2nd fully connected layer
fcl_2 <- mx.symbol.FullyConnected(data = tanh_3, num_hidden = 1)
## Output
#NN_model <- mx.symbol.SoftmaxOutput(data = fcl_2)
label <- mx.symbol.Variable("label")
#NN_model <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fcl_2, shape = 0) - label))
NN_model <- mx.symbol.LinearRegressionOutput(fcl_2)




#Didn't work well, predicted same number continuously regardless of image
## Train on samples
model <- mx.model.FeedForward.create(NN_model, X = train_array, y = train_y,
                                     #                                     ctx = device,
                                     num.round = 30,
                                     array.batch.size = 100,
                                    # initializer=mx.init.uniform(0.002),
initializer = mx.init.Xavier(factor_type = "in", magnitude = 2.34), 
                                     learning.rate = 0.00001,
                                     momentum = 0.9,
                                     wd = 0.00001,
                                     eval.metric = mx.metric.rmse)
                                     #epoch.end.callback = #mx.callback.log.train.metric(100))



pred <- predict(model, test_array)
#gives the same numeric output 
#or when train_y is scaled to [0,1] gives very poor responses and negative numbers

Deep learning models tend to work better when the data is centered on the mean (i.e. shift so mean=0). Try pre-processing your training data into standard/score Z-values on your image inputs and regression output. — j314erre, Aug 09 '17 at 14:22
That did not work, but thanks for the suggestion. I think I am doing something wrong with my syntax or preparation so let me know if someone catches it. Still not sure why it is not working. — Ic3MaN911, Aug 19 '17 at 04:21

score 0 · Answer 1 · answered Jan 23 '18 at 00:14

I run your example, and I think the model itself is fine. I checked that by substituting your input with MNIST input taken from the official Kaggle tutorial.

After training your model with your training parameters on the MNIST train.array, I run prediction on the MNIST test.array and receive a good distribution of results.

If I use MNIST-trained model and your test_array data, I still receive a nice distribution of predictions.

But as soon as I train your model on your randomly generated train_array and try to predict the results from your test_array or MNIST test.array, I get very similar predictions for all items - the difference starts only after 3rd number after the dot.

I can only assume that the network cannot find any pattern in the white noise (randomly generated data). I can make difference bigger by setting weight decay parameter (wd) big, like wd=10, but it is surely a bad idea.

If your input data is different from the one in the example, then take a closer look to its pre-processing - maybe there is a bug there.

CNN Image Recognition in MXNET in R to output Scalar number instead of Class

1 Answers1