0

I'm trying to do matrix addition using Alea CuBlas axpy, but it seems to only add the top row

let matrixAddition (a:float[,]) (b: float[,]) =
     use mA = gpu.AllocateDevice(a)
     use mB = gpu.AllocateDevice(b)
     blas.Axpy(a.Length,1.,mA.Ptr,1,mB.Ptr,1)
     Gpu.Copy2DToHost(mB)
JokingBear
  • 39
  • 2
  • 8

2 Answers2

1

I took your example and it runs fine.

Code:

        var gpu = Gpu.Default;
        var blas = Blas.Get(Gpu.Default);

        var hostA = new float[,]
        {
            {1, 2, 3},
            {4, 5, 6},
            {7, 8, 9},
        };

        var hostB = new float[,]
        {
            {10, 20, 30},
            {40, 50, 60},
            {70, 80, 90},
        };

        PrintArray(hostA);
        PrintArray(hostB);

        var deviceA = gpu.AllocateDevice(hostA);
        var deviceB = gpu.AllocateDevice(hostB);

        blas.Axpy(deviceA.Length, 1f, deviceA.Ptr, 1, deviceB.Ptr, 1);

        var hostC = Gpu.Copy2DToHost(deviceB);

        PrintArray(hostC);

Print Helper:

    private static void PrintArray(float[,] array)
    {
        for (var i = 0; i < array.GetLength(0); i++)
        {
            for (var k = 0; k < array.GetLength(1); k++)
            {
                Console.Write("{0} ", array[i, k]);
            }

            Console.WriteLine();
        }

        Console.WriteLine(new string('-', 10));
    }

This is what I get:

output

Two questions: - What version of AleaGpu are you using? - What version of the CUDA Toolkit are you using?

I coded my sample against: Alea 3.0.4-beta2 and I have CudaToolkit 8.0.

Just to be sure I tried to code your example in F#. (I'm not fluent in F#)

Code:

let gpu = Gpu.Default;
let blas = Blas.Get(Gpu.Default);

let hostA: float[,] = array2D [[  1.0;  2.0;  3.0 ]; [  4.0;  5.0;  6.0 ]; [  7.0;  8.0;  9.0 ]]
let hostB: float[,] = array2D [[ 10.0; 20.0; 30.0 ]; [ 40.0; 50.0; 60.0 ]; [ 70.0; 80.0; 90.0 ]]

PrintArray(hostA)
PrintArray(hostB)

use deviceA = gpu.AllocateDevice(hostA);
use deviceB = gpu.AllocateDevice(hostB);

blas.Axpy(deviceA.Length, 1.0, deviceA.Ptr, 1, deviceB.Ptr, 1);

let hostC = Gpu.Copy2DToHost(deviceB);

PrintArray(hostC)

Print Helper:

let PrintArray(array: float[,]): unit =
    for i in 0 .. array.GetLength(0) - 1 do
        for k in 0 .. array.GetLength(1) - 1 do
            Console.Write("{0} ", array.[i, k]);
        Console.WriteLine();

    Console.WriteLine(new string('-', 10));
redb
  • 512
  • 8
  • 22
1

There is one important difference between JokingBear's code and redb's code.

At this line of the problematic code

blas.Axpy(a.Length,1.,mA.Ptr,1,mB.Ptr,1)

a has type float[,] and the Length will be the number of elements in that matrix a.

However, the corrected code use this

blas.Axpy(deviceA.Length, 1f, deviceA.Ptr, 1, deviceB.Ptr, 1);

deviceA is not float[,] anymore but DeviceMemory2D object.

The DeviceMemory2D.Length is surprisingly larger (384 for 3x3 matrix on my hardware) than (float[,]).Length as the allocation on the GPU seems to occupy much more space for some unknown reasons.

The key reason that the JokingBear's code sums only the top row because the (float[,]).Length is too short for the data structure on GPU memory which is much longer. There is nothing to do with the version of alea.

koonyook
  • 159
  • 1
  • 8