I am trying to use the mxnet package in R using a CNN to try and predict a scalar output (in my case wait time) based on the images.
However, when I do this, I get the same resultant output (it predicts the same number which is probably just the average of all of the results). How do I get it to predict the scalar output correctly.
My image has already been pre-processed by greyscaling it and converting into the pixel format below and scaling it to 28 x 28 (I have also tried different sizes with no effect).
I am essentially using images to predict wait times which is why my train_y is the current wait times in seconds. When using this approach, while leaving my train_y as the current wait time in seconds, the algorithm just predicts the same number.
However, when I transform the train_y to a [0,1] by guessing the maximum value (20000), the CNN does output different numbers, but when scaling those numbers again by multiplying by 20000, I seem to get predictions with negative numbers, and numbers that are just way too skewed giving poor results to the model. Negative numbers especially don't make sense since all my train_y are positive and since I am dealing with time, there is no such thing as negative numbers
I have also played around with the learning rate by testing it from 0.05, 0.01, 0.001, 0.0001, 0.00001, and so forth, until 2e-8 with no effect on the model. I have also played around with the initializer
I have also played around with momentum by changing it from 0.9, to 0.95 with no effect on the model.
Here is my reproducible code:
set.seed(0)
df <- data.frame(replicate(784,runif(7538)))
df$waittime <- 1000*runif(7538)
training_index <- createDataPartition(df$waittime, p = .9, times = 1)
training_index <- unlist(training_index)
train_set <- df[training_index,]
dim(train_set)
test_set <- df[-training_index,]
dim(test_set)
## Fix train and test datasets
train_data <- data.matrix(train_set)
train_x <- t(train_data[, -785])
train_y <- train_data[,785]
train_array <- train_x
dim(train_array) <- c(28, 28, 1, ncol(train_array))
test_data <- data.matrix(test_set)
test_x <- t(test_set[,-785])
test_y <- test_set[,785]
test_array <- test_x
dim(test_array) <- c(28, 28, 1, ncol(test_x))
library(mxnet)
## Model
mx_data <- mx.symbol.Variable('data')
## 1st convolutional layer 5x5 kernel and 20 filters.
conv_1 <- mx.symbol.Convolution(data = mx_data, kernel = c(5, 5), num_filter = 20)
tanh_1 <- mx.symbol.Activation(data = conv_1, act_type = "tanh")
pool_1 <- mx.symbol.Pooling(data = tanh_1, pool_type = "max", kernel = c(2, 2), stride = c(2,2 ))
## 2nd convolutional layer 5x5 kernel and 50 filters.
conv_2 <- mx.symbol.Convolution(data = pool_1, kernel = c(5,5), num_filter = 50)
tanh_2 <- mx.symbol.Activation(data = conv_2, act_type = "tanh")
pool_2 <- mx.symbol.Pooling(data = tanh_2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
## 1st fully connected layer
flat <- mx.symbol.Flatten(data = pool_2)
fcl_1 <- mx.symbol.FullyConnected(data = flat, num_hidden = 500)
tanh_3 <- mx.symbol.Activation(data = fcl_1, act_type = "tanh")
## 2nd fully connected layer
fcl_2 <- mx.symbol.FullyConnected(data = tanh_3, num_hidden = 1)
## Output
#NN_model <- mx.symbol.SoftmaxOutput(data = fcl_2)
label <- mx.symbol.Variable("label")
#NN_model <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fcl_2, shape = 0) - label))
NN_model <- mx.symbol.LinearRegressionOutput(fcl_2)
#Didn't work well, predicted same number continuously regardless of image
## Train on samples
model <- mx.model.FeedForward.create(NN_model, X = train_array, y = train_y,
# ctx = device,
num.round = 30,
array.batch.size = 100,
# initializer=mx.init.uniform(0.002),
initializer = mx.init.Xavier(factor_type = "in", magnitude = 2.34),
learning.rate = 0.00001,
momentum = 0.9,
wd = 0.00001,
eval.metric = mx.metric.rmse)
#epoch.end.callback = #mx.callback.log.train.metric(100))
pred <- predict(model, test_array)
#gives the same numeric output
#or when train_y is scaled to [0,1] gives very poor responses and negative numbers