0

How can we use array of 10000 rows and 10000 cols (instead of rows =10 and rows =5) with AleaGpu ?

private void button3_Click(object sender, EventArgs e)
{
    var worker = Worker.Default;
    const int rows = 10;
    const int cols = 5;
    var rng = new Random();
    var inputs = new double[rows, cols];
    for (var row = 0; row < rows; ++row)
    {
        for (var col = 0; col < cols; ++col)
        {
            inputs[row, col] = rng.Next(1, 100);
        }
    }
    var dInputs = worker.Malloc(inputs);
    var dOutputs = worker.Malloc<double>(rows, cols);
    var lp = new LaunchParam(1, 1);
    worker.Launch(Kernel, lp, dOutputs.Ptr, dInputs.Ptr, rows, cols);
    var outputs = new double[rows, cols];
    dOutputs.Gather(outputs);
    Assert.AreEqual(inputs, outputs);
}

if I use rows = 10000 and cols = 10000 (instead of rows =10 and rows =5) :

I get an error "An unhandled exception of type 'Alea.CUDA.CUDAInterop.CUDAException' occurred in Alea.CUDA.dll" in the function : public static void Gather(this DeviceMemory dmem, T[,] array2D) :

    dmem.Worker.EvalAction(() =>
        {
            CUDAInterop.cuSafeCall(CUDAInterop.cuMemcpyDtoH(hostPtr, devicePtr,
                new IntPtr(Intrinsic.__sizeof<T>() * rows * cols)));
        });  

How Can I remove this error ?

talonmies
  • 70,661
  • 34
  • 192
  • 269
Emmanuel
  • 21
  • 5

1 Answers1

1

First, there is a enum field in CUDAException, so better you can debug to get what CUDA error it is. Such as the following code:

        catch (CUDAInterop.CUDAException x)
        {
            var code = x.Data0;
            Console.WriteLine("ErrorCode = {0}", code);
            Assert.Fail();
        }

Now, I ran it with 10000x10000 matrix, and I get the error: CUDA_ERROR_LAUNCH_FAILED. And the reason is, the kernel failed to execute, because you are iterating a big matrix in ONE thread. I create this test just for simply showing how to use 2d array, but if you are doing something real and big, avoid using such simple kernel, and execute it in 1 thread! that will cause the kernel to run for a long time, and finally the CUDA driver find it is too long to run, then driver will kill this kernel execution. So design a new real parallel kernel to iterate big matrix.

Xiang Zhang
  • 2,831
  • 20
  • 40
  • Thank you so much for your answer, I had the same exact issue and increasing the parallelism did in fact solve the issue for me! – Sergio0694 Oct 17 '17 at 22:33