Am looking to do some distributed calc. using GPU for machine learning ? Just wondering if anybody has experience with MXNET (perf. vs Theano)
Reference http://www.cs.cmu.edu/~muli/file/mxnet-learning-sys.pdf
Thanks
Am looking to do some distributed calc. using GPU for machine learning ? Just wondering if anybody has experience with MXNET (perf. vs Theano)
Reference http://www.cs.cmu.edu/~muli/file/mxnet-learning-sys.pdf
Thanks
I had a lot of experience with both mxnet
and Theano
(via lasagne and keras)
Benchmarking is always biased, so I will not comment on that, except to note that all the frameworks are very fast. Here are several things that should help you decide:
Theano
compared to mxnet
is like assembly compared to python. Theano has low-level primitives to build machine learning models, and on itself does not define any layers or optimizers, and you would usually use it with some deep-learning library, such as Lasagne or Keras, while mxnet
is higher level. So fare comparison would be mxnet
vs Keras
, not mxnet
vs Theano
.mxnet
is a more recent library, and certain things in it are not as polished yet, and there's way fewer resources online than for Theano
.Theano
(and therefore Lasagne and Keras) compile models when they run them for the first time into C++ and Cuda, which is very slow. For a very complex model, such as an unrolled LSTM, it can take good couple minutes to compile. It is usually very little compared to the time it will take for the model to train (hours to weeks), but is very annoying when you prototype.Overall, if you choose between these two frameworks, I would suggest Theano + Keras for everything except for the recurrent or very deep networks, otherwise the compilation in Theano
will be killing you.
Also look into TensorFlow
. It is (subjectively) slower than mxnet
, but is more mature and has more resources online.