1

Model train system: AWS Ubuntu p2.xlarge, R 3.4.0, mxnet_1.0.1. Saved via:

mx.model.save(A3.MXmodel, "Action/A3.MXmodel", iteration = 3000)

Loading on same system works fine via:

A3.MXmodel <- mx.model.load("A3.MXmodel", iteration=3000)
A3.pred <- predict(A3.MXmodel, as.matrix(nNewVector))
A3.pred.label = max.col(t(A3.pred))-1

Moving the model files to a new system (AMI clone of first instance BUT on g2.xlarge). And attempting to predict:

A3.pred <- predict(A3.MXmodel, as.matrix(nNewVector))

Leads to an immediate crash of rstudio, no data saved or error messages. I can confirm mxnet is working on the new instance via the installation check:

library(mxnet)
a <- mx.nd.ones(c(2,3), ctx = mx.gpu())
b <- a * 2 + 1
b

Do I have to specifify somewhere on the new isntance that the models are based on GPU devices? Can a model trained on a GPU instance be run on a CPU instance with CPU mxnet build?

Garglesoap
  • 565
  • 6
  • 18
  • This might be obvious, but have you confirmed that the exact same code works fine on p2.x? – Sina Afrooze Jan 26 '18 at 07:14
  • Good idea, seems to work fine on p3.xlarge. Too bad because the g2's are much cheaper. Might move system to CPU only EC2s. Thanks – Garglesoap Jan 28 '18 at 01:09
  • Any luck with CPU only EC2s for the model inference @Garglesoap? My guess is that your model was too large for the g2.xlarge instance. It only has 4GB of available GPU memory, compared to 12GB on the p2.xlarge where the model was trained. – Thom Lane Feb 23 '18 at 01:14
  • The same models are working fine on other p2 and p3 instances, so I suspect your idea about gpu ram might be correct: the p2 and p3 are different gpu architectures – Garglesoap Feb 23 '18 at 01:20
  • I agree that model might be too large. And I would recommend you to check log files like it is explained here: https://support.rstudio.com/hc/en-us/articles/200554756-RStudio-Application-Logs Alternatively, if nothing interesting is found in logs, you can see what is going on by running your failing script via R REPL. Just type R in your terminal, paste the code and hit enter. If it fails, it won't close the terminal, and you would be able to see the error message. – Sergei Feb 23 '18 at 18:17

1 Answers1

2

To answer the specific questions:

Do I have to specifify somewhere on the new isntance that the models are based on GPU devices?

No the model's structure and parameters are stored but there's no coding for the hardware.

Can a model trained on a GPU instance be run on a CPU instance with CPU mxnet build?

Yes. And that may be highly desirable -- train on GPU for speed, do inference on CPU because it's less expensive in computation and cost.