In the code of custom operators, I have such lines:
for i in xrange(batch_size):
numpy.XXX
for better performance, I use multiprocessing. But it's stuck.
Probably, a full example of your custom operator could help to diagnose the issue, but I can give a couple of advice based on the code snippet you have provided:
Don't use numpy in custom operators. The thing is that using Numpy would eventually force MXNet to move data into CPU making computations slower compare to what you could get using GPU. With NumPy it is impossible to use GPUs.
Don't use foreach to loop through your items in your batch. Because of the performance reasons, all operators of MXNet support batched input.
If you still need to use loops in your custom operators for other dimensions, please use foreach operator. It is made in a way that it can be used with both NDArray
and Symbol
. Checkout the tutorial how to use it.