0

I am trying to make an operator in MXNet that will introduce sparsity in the output in the following way:

  • Doing the pruning for each data-point separately (axis 0 is for the data-points)
  • Dropping lower weights to 0
  • Keeping the same dimensions as the input

I am currently doing this with the following piece of code (assuming act is the input to this operator):

flat = mx.sym.flatten(act)
mask = mx.sym.topk(flat, k = int(frac * flat.infer_shape(data=shape)[1][0][1]), axis = 1, ret_typ = 'mask').reshape(act.infer_shape(data=shape)[1][0])
custom = mx.sym.where(mask == 1, act, mask)

With this implementation, there is a limit on the dimensionality of the tensor act. A very big tensor, when flattened and passed into topk results in an IndexFill error:

[20:27:53] /home/ubuntu/mxnet/dmlc-core/include/dmlc/logging.h:304: [20:27:53] /home/ubuntu/mxnet/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:58: too large launch parameter: IndexFill[100352,1], [32,32,1]

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fb593bbc9ac]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN7mshadow4cuda9IndexFillIffEEvNS_6TensorINS_3gpuELi2ET0_EERKNS2_IS3_Li1ET_EERKS5_+0x492) [0x7fb59581bf82]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet2op8TopKImplIN7mshadow3gpuEEEvNS_10RunContextENS_8ResourceERKNS_5TBlobERKSt6vectorIS6_SaIS6_EERKNS0_9TopKParamE+0x3ca1) [0x7fb595841521]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet2op4TopKIN7mshadow3gpuEEEvRKN4nnvm9NodeAttrsERKNS_9OpContextERKSt6vectorINS_5TBlobESaISC_EERKSB_INS_9OpReqTypeESaISH_EESG_+0x345) [0x7fb595842cc5]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(+0x1318cf9) [0x7fb5947aecf9]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x8c) [0x7fb5947ef07c]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x60) [0x7fb5947f2190]
[bt] (7) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb1a60) [0x7fb5a3c45a60]
[bt] (8) /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184) [0x7fb5a9e07184]
[bt] (9) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fb5a9b34bed]

So my question is:

  • It currently functions with a very small batch-size. But is there a way to increase the batch-size and avoid the error?
  • Is there a better way of implementing the operator?

1 Answers1

2

The cause of the problem has to do with the implementation of the GPU operator and its kernel, specifically the number of threads, blocks, and thus the grid dimensions at the kernel launch.

In particular, the NVIDIA CUDA compute capabilities specify a maximum number of threads, threads per block as well as number of blocks per dimension (grid dimension). See, for example, http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities.

In your case, the threshold of 65535 is crossed for the first grid dimension. In MXNet, this threshold is also defined as kMaxGridDim. Thus it throws the error.

To solve the problem, there may be different options: Changing the specific operator itself and the number of threads requested for the kernel launch and then possibly the kernel itself; or, a fix in the generic MXNet GPU kernel launch function could do the trick as well.

I will look into it tomorrow and update my answer when the problem is fixed.

edit: The issue has been addressed and resolved: https://github.com/dmlc/mshadow/pull/277

Stefan
  • 36
  • 4