0

I have a big dataset (around 20GB for training and 2GB for testing) and I want to use MXnet and R. Due to lack of memory, I search for an iterator to load the training and test set by a custom iterator and I found this solution.

Now, I can train the model using the code on this page, but the problem is that if I read the test set with the save iterator as follow:

test.iter <- CustomCSVIter$new(iter = NULL, data.csv = "test.csv", data.shape = 480, batch.size = batch.size)

Then, the prediction command does not work and there is no prediction template in the page;

preds <- predict(model, test.iter)

So, my specific problem is, if I build my model using the code on the page, how can I read my test set and predict its labels for the evaluation process? My test set and train set is in this format.

Thank you for your help

Mohammad
  • 1,006
  • 2
  • 15
  • 29

1 Answers1

1

It actually works exactly as you explained. You just call predict with model and iterator:

preds = predict(model, test.iter)

The only trick here is that the predictions are displayed column-wise. By that I mean, if you take the whole sample you are referring to, execute it and add the following lines:

test.iter <- CustomCSVIter$new(iter = NULL, data.csv = "mnist_train.csv", data.shape = 28, batch.size = batch.size)
preds = predict(model, test.iter)

preds[,1] # index of the sample to see in the column position

You receive:

 [1] 5.882561e-11 2.826923e-11 7.873914e-11 2.760162e-04 1.221306e-12 9.997239e-01 4.567645e-11 3.177564e-08 1.763889e-07 3.578671e-09

This show the softmax output for the 1st element of the training set. If you try to print everything by just writing preds, then you will see only empty values because of the RStudio print limit of 1000 - real data will have no chance to appear.

Notice that I reuse the training data for prediction. I do so, since I don't want to adjust iterator's code, which needs to be able to consume the data with and without a label in front (training and test sets). In real-world scenario you would need to adjust iterator so it would work with and without a label.

Sergei
  • 1,617
  • 15
  • 31