Image Recognition with Scalar output using CNN MXnet in R

Question

So I am trying to use image recognition using the mxnet package in R using a CNN to try and predict a scalar output (in my case wait time) based on the image.

However, when I do this, I get the same resultant output (it predicts the same number which is probably just the average of all of the results). How do I get it to predict the scalar output correctly.

Also, my image has already been pre-processed by greyscaling it and converting into the pixel format below. I am essentially using images to predict wait times which is why my train_y is the current wait times in seconds, hence why I didn't convert it into a [0,1] range. I would prefer a regression type output or some kind of scalar output that outputs the predicted wait time based on the image.

What other ways would you recommend to tackle this problem, not sure if my approach is correct.

Here is my reproducible code:

set.seed(0)

df <- data.frame(replicate(784,runif(7538)))
df$waittime <- 1000*runif(7538)


training_index <- createDataPartition(df$waittime, p = .9, times = 1)
training_index <- unlist(training_index)

train_set <- df[training_index,]
dim(train_set)
test_set <- df[-training_index,]
dim(test_set)


## Fix train and test datasets
train_data <- data.matrix(train_set)
train_x <- t(train_data[, -785])
train_y <- train_data[,785]
train_array <- train_x
dim(train_array) <- c(28, 28, 1, ncol(train_array))


test_data <- data.matrix(test_set)
test_x <- t(test_set[,-785])
test_y <- test_set[,785]
test_array <- test_x
dim(test_array) <- c(28, 28, 1, ncol(test_x))




library(mxnet)
## Model
mx_data <- mx.symbol.Variable('data')
## 1st convolutional layer 5x5 kernel and 20 filters.
conv_1 <- mx.symbol.Convolution(data = mx_data, kernel = c(5, 5), num_filter = 20)
tanh_1 <- mx.symbol.Activation(data = conv_1, act_type = "tanh")
pool_1 <- mx.symbol.Pooling(data = tanh_1, pool_type = "max", kernel = c(2, 2), stride = c(2,2 ))
## 2nd convolutional layer 5x5 kernel and 50 filters.
conv_2 <- mx.symbol.Convolution(data = pool_1, kernel = c(5,5), num_filter = 50)
tanh_2 <- mx.symbol.Activation(data = conv_2, act_type = "tanh")
pool_2 <- mx.symbol.Pooling(data = tanh_2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
## 1st fully connected layer
flat <- mx.symbol.Flatten(data = pool_2)
fcl_1 <- mx.symbol.FullyConnected(data = flat, num_hidden = 500)
tanh_3 <- mx.symbol.Activation(data = fcl_1, act_type = "tanh")
## 2nd fully connected layer
fcl_2 <- mx.symbol.FullyConnected(data = tanh_3, num_hidden = 1)
## Output
#NN_model <- mx.symbol.SoftmaxOutput(data = fcl_2)
label <- mx.symbol.Variable("label")
#NN_model <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fcl_2, shape = 0) - label))
NN_model <- mx.symbol.LinearRegressionOutput(fcl_2)


## Device used. Sadly not the GPU :-(
#device <- mx.gpu
#Didn't work well, predicted same number continuously regardless of image
## Train on 1200 samples
model <- mx.model.FeedForward.create(NN_model, X = train_array, y = train_y,
                                     #                                     ctx = device,
                                     num.round = 30,
                                     array.batch.size = 100,
                                     initializer=mx.init.uniform(0.002), 
                                     learning.rate = 0.00001,
                                     momentum = 0.9,
                                     wd = 0.00001,
                                     eval.metric = mx.metric.rmse)
                                     epoch.end.callback = mx.callback.log.train.metric(100))



pred <- predict(model, test_array)
#gives the same numeric output

Yes, the data is all within [0,1] just like this dummy example — Ic3MaN911, Aug 02 '17 at 01:39
If you run the test example you will see that the data is all [0,1] — Ic3MaN911, Aug 02 '17 at 01:40
I mean the "train_y". You can also try "initializer = mx.init.Xavier(factor_type = "in", magnitude = 2.34)" — Qiang Kou, Aug 02 '17 at 18:52

score 0 · Answer 1 · answered Aug 02 '17 at 19:02

Just modify your code a little. train_y is also in [0, 1] and initializer = mx.init.Xavier(factor_type = "in", magnitude = 2.34).

library(caret)

set.seed(0)

df <- data.frame(replicate(784, runif(7538)))
df$waittime <- runif(7538)

training_index <- createDataPartition(df$waittime, p = .9, times = 1)
training_index <- unlist(training_index)

train_set <- df[training_index, ]
dim(train_set)
test_set <- df[-training_index, ]
dim(test_set)

## Fix train and test datasets
train_data <- data.matrix(train_set)
train_x <- t(train_data[,-785])
train_y <- train_data[, 785]
train_array <- train_x
dim(train_array) <- c(28, 28, 1, ncol(train_array))

test_data <- data.matrix(test_set)
test_x <- t(test_set[, -785])
test_y <- test_set[, 785]
test_array <- test_x
dim(test_array) <- c(28, 28, 1, ncol(test_x))

library(mxnet)
## Model
mx_data <- mx.symbol.Variable('data')
## 1st convolutional layer 5x5 kernel and 20 filters.
conv_1 <- mx.symbol.Convolution(data = mx_data, kernel = c(5, 5), num_filter = 20)
tanh_1 <- mx.symbol.Activation(data = conv_1, act_type = "tanh")
pool_1 <- mx.symbol.Pooling(data = tanh_1, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
## 2nd convolutional layer 5x5 kernel and 50 filters.
conv_2 <- mx.symbol.Convolution(data = pool_1, kernel = c(5, 5), num_filter = 50)
tanh_2 <- mx.symbol.Activation(data = conv_2, act_type = "tanh")
pool_2 <- mx.symbol.Pooling(data = tanh_2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
## 1st fully connected layer
flat <- mx.symbol.Flatten(data = pool_2)
fcl_1 <- mx.symbol.FullyConnected(data = flat, num_hidden = 500)
tanh_3 <- mx.symbol.Activation(data = fcl_1, act_type = "tanh")
## 2nd fully connected layer
fcl_2 <- mx.symbol.FullyConnected(data = tanh_3, num_hidden = 1)
## Output
#NN_model <- mx.symbol.SoftmaxOutput(data = fcl_2)
label <- mx.symbol.Variable("label")
#NN_model <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fcl_2, shape = 0) - label))
NN_model <- mx.symbol.LinearRegressionOutput(fcl_2)

mx.set.seed(0)
model <- mx.model.FeedForward.create(NN_model,
                                     X = train_array,
                                     y = train_y,
                                     num.round = 4,
                                     array.batch.size = 64,
                                     initializer = mx.init.Xavier(factor_type = "in", magnitude = 2.34),
                                     learning.rate = 0.00001,
                                     momentum = 0.9,
                                     wd = 0.00001,
                                     eval.metric = mx.metric.rmse)

pred <- predict(model, test_array)

pred[1,1:10]
# [1] 0.4859098 0.4865469 0.5671642 0.5729486 0.5008956 0.4962234 0.4327411 0.5478653 0.5446281 0.5707113

So the reason I kept train_y as that way is because when I predict, that is what I want as the output. Is there any way to retain that. Train_y is essentially the wait time in seconds which is what I want as the output. If I convert it, it will effectively be pretty meaningless, especially because standardizing train_y will have a different result than standardizing test_y because they will have different max's and min's. Does that make sense? — Ic3MaN911, Aug 03 '17 at 02:13
I am sorry I don't understand which is the problem of scaling `train_y`. You can just multiply it by `1000` in this case. — Qiang Kou, Aug 03 '17 at 16:06
So I have tried it with scaling, I divided it by 20000, used that to predict, then multipled the predicted by 20000, it resulted in a very poor model. Only 9% accuracy. Some of the predicted numbers were even negative (which is impossible because I am predicting time) even though the range of my scaled numbers is [0.00005, 0.73720] — Ic3MaN911, Aug 03 '17 at 23:11
I think we are dealing with a regression problem. What do you mean by "9% accuracy"? — Qiang Kou, Aug 03 '17 at 23:44
Yes, I want a regression type output, sorry, I mentioned that in the previous post but forgot to include it here. As an accuracy measure, I test whether the predicted output is within 180 seconds of the actual waiting time. I also use MAPE and RMSE which both resulted in poorer than the standard output of the same number — Ic3MaN911, Aug 04 '17 at 00:17
Sorry to bother again, but does that make sense? How do I take that into account. I want a scalar/regression output — Ic3MaN911, Aug 06 '17 at 05:51

score 0 · Answer 2 · answered Jan 08 '18 at 08:07

It appears that your network is collapsing, due to a number of potentials. I would try the following modifications:

Use ReLU activation instead of tanh. ReLU has proven to be a much more robust activation in Conv networks than sigmoid or tanh.
User batch-normalization between at the input of your convolutional layers (see paper here).
Divide your range into sections and use softmax. If you must have regression, consider a separate regression network for each range and select the correct regression net based on the output of the softmax. Cross Entropy loss has shown more success in learning highly non-linear functions.

Image Recognition with Scalar output using CNN MXnet in R

2 Answers2

Linked