0

I am trying to use image recognition to output a regression style number using the mxnet package in R using a CNN.

I have used this as the basis of my analysis: https://rstudio-pubs-static.s3.amazonaws.com/236125_e0423e328e4b437888423d3821626d92.html

This is an image recognition analysis using mxnet in R using CNN, so I have followed these steps to prepare my data for preprocessing by doing the same steps, resizing, grayscaling.

My "image" dataset looks like like this, I have 784 columns of pixels, and the last column is a numeric column with the "label" that I am trying to predict so it will be: 1132, 1491, 845, etc.

I have pixels in each cell with their numeric values, 784 columns of pixels and the last column is a numeric column with the "label" that I am trying to predict using the images

From there, I create a training and testing:

library(pbapply)
library(caret)
## test/training partitions
training_index <- createDataPartition(image$STOPPING_TIME, p = .9, times = 1)
training_index <- unlist(training_index)
train_set <- image[training_index,]
dim(train_set)
test_set <- image[-training_index,]
dim(test_set)


## Fix train and test datasets
train_data <- data.matrix(train_set)
train_x <- t(train_data[, -785])
train_y <- train_data[,785]
train_array <- train_x
dim(train_array) <- c(28, 28, 1, ncol(train_x))

test_data <- data.matrix(test_set)
test_x <- t(test_set[,-785])
test_y <- test_set[,785]
test_array <- test_x
dim(test_array) <- c(28, 28, 1, ncol(test_x))

Now I get onto using the mxnet, which is what is causing problems, not sure what I am doing wrong:

library(mxnet)
## Model
mx_data <- mx.symbol.Variable('data')
## 1st convolutional layer 5x5 kernel and 20 filters.
conv_1 <- mx.symbol.Convolution(data = mx_data, kernel = c(5, 5), num_filter = 20)
tanh_1 <- mx.symbol.Activation(data = conv_1, act_type = "tanh")
pool_1 <- mx.symbol.Pooling(data = tanh_1, pool_type = "max", kernel = c(2, 2), stride = c(2,2 ))
## 2nd convolutional layer 5x5 kernel and 50 filters.
conv_2 <- mx.symbol.Convolution(data = pool_1, kernel = c(5,5), num_filter = 50)
tanh_2 <- mx.symbol.Activation(data = conv_2, act_type = "tanh")
pool_2 <- mx.symbol.Pooling(data = tanh_2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
## 1st fully connected layer
flat <- mx.symbol.Flatten(data = pool_2)
fcl_1 <- mx.symbol.FullyConnected(data = flat, num_hidden = 500)
tanh_3 <- mx.symbol.Activation(data = fcl_1, act_type = "tanh")
## 2nd fully connected layer
fcl_2 <- mx.symbol.FullyConnected(data = tanh_3, num_hidden = 2)
## Output
label <- mx.symbol.Variable("label")
NN_model <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fcl_2, shape = 0) - label))


## Set seed for reproducibility
mx.set.seed(100)


## Train on 1200 samples
model <- mx.model.FeedForward.create(NN_model, X = train_array, y = train_y,
                                     num.round = 30,
                                     array.batch.size = 100,
                                    initializer=mx.init.uniform(0.002), 
                                     learning.rate = 0.05,
                                     momentum = 0.9,
                                     wd = 0.00001,
                                     eval.metric = mx.metric.rmse)
                                     epoch.end.callback = mx.callback.log.train.metric(100))

I get the error:

[00:30:08] D:\Program Files (x86)\Jenkins\workspace\mxnet\mxnet\dmlc-core\include\dmlc/logging.h:308: [00:30:08] d:\program files (x86)\jenkins\workspace\mxnet\mxnet\src\operator\tensor\./matrix_op-inl.h:134: Check failed: oshape.Size() == dshape.Size() (100 vs. 200) Target shape size is different to source. Target: (100,)
Source: (100,2)
Error in symbol$infer.shape(list(...)) : 
  Error in operator reshape9: [00:30:08] d:\program files (x86)\jenkins\workspace\mxnet\mxnet\src\operator\tensor\./matrix_op-inl.h:134: Check failed: oshape.Size() == dshape.Size() (100 vs. 200) Target shape size is different to source. Target: (100,)
Source: (100,2)

I can get it to work using if I use

NN_model <- mx.symbol.SoftmaxOutput(data = fcl_2)

and keep the rmse there, but it doesn't improve performance of my model after 30 iterations.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Ic3MaN911
  • 251
  • 1
  • 2
  • 8

1 Answers1

2

Your last fully connected layer fcl_2 <- mx.symbol.FullyConnected(data = tanh_3, num_hidden = 2) creates an output shape of (batch_size, 2), reshaping it results in (2 * batch_size).

Then you are doing (mx.symbol.Reshape(fcl_2, shape = 0) - label), i.e. you are trying to subtract tensors of the following shapes: (200) - (100), which cannot work.

Instead what you likely want to do is change your last fully connected layer to have only one hidden unit fcl_2 <- mx.symbol.FullyConnected(data = tanh_3, num_hidden = 1), as you say that you are trying to learn a network that predicts a single scalar output.

leezu
  • 512
  • 3
  • 17
  • Thanks that did get the NN to start training, but now it is only resulting in NaN's [30] Train-rmse=NaN – Ic3MaN911 Jul 26 '17 at 21:22
  • 1
    I suggest you to follow the introduction to debugging neural networks at http://russellsstewart.com/notes/0.html and to make your question reproducible to make it easier to understand your issue – leezu Jul 27 '17 at 03:22
  • https://stackoverflow.com/questions/45383926/image-recognition-with-scalar-output-using-cnn-mxnet-in-r Made a reproducible example of what I am doing – Ic3MaN911 Jul 29 '17 at 20:40
  • Can someone please help, dire need! – Ic3MaN911 Aug 06 '17 at 06:14