CUDAfy.Net / OpenCL, struct containing byte array results in non-blittable exception

Question

Ok, so I'm using CUDAfy.Net, and I have the following 3 structs:

[Cudafy]
public struct Collider
{
    public int Index;
    public int Type;
    public Sphere Sphere;
    public Plane Plane;
    public Material Material;
}

[Cudafy]
public struct Material
{
    public Color Color;
    public Texture Texture;
    public float Shininess;
}

[Cudafy]
public struct Texture
{
    public int Width, Height;
    public byte[ ] Data;
}

Now, as soon as I send over an array of Collider objects to the GPU, using

CopyToDevice<GPU.Collider>( ColliderArray );

I get the following error:

An unhandled exception of type 'System.ArgumentException' occurred in mscorlib.dll
Additional information: Object contains non-primitive or non-blittable data.

Does anyone with any experience with either CUDAfy.Net, or OpenCL ( since it basically compiles down into OpenCL ), have any idea how I could accomplish this? The whole problem lies in the byte array of Texture, since everything worked just fine when I didn't have a Texture struct and the array is the non-blittable part as far as I know. I had found several questions regarding the same problem, and they fixed it using fixed-size arrays. However, I am unable to do this as these are textures, which can have greatly varying sizes.

EDIT: Right now, I'm doing the following on the CPU:

    public unsafe static GPU.Texture CreateGPUTexture( Cudafy.Host.GPGPU _GPU, System.Drawing.Bitmap Image )
    {
        GPU.Texture T = new GPU.Texture( );
        T.Width = Image.Width;
        T.Height = Image.Height;
        byte[ ] Data = new byte[ Image.Width * Image.Height * 3 ];


        for ( int X = 0; X < Image.Width; X++ )
            for ( int Y = 0; Y < Image.Height; Y++ )
            {
                System.Drawing.Color C = Image.GetPixel( X, Y );
                int ID = ( X + Y * Image.Width ) * 3;
                Data[ ID ] = C.R;
                Data[ ID + 1 ] = C.G;
                Data[ ID + 2 ] = C.B;
            }

        byte[ ] _Data = _GPU.CopyToDevice<byte>( Data );
        IntPtr Pointer = _GPU.GetDeviceMemory( _Data ).Pointer;
        T.Data = ( byte* )Pointer.ToPointer( );

        return T;
    }

I then attach this Texture struct to the colliders, and send them to the GPU. This all goes without any errors. However, as soon as I try to USE a texture on the GPU, like this:

    [Cudafy]
    public static Color GetTextureColor( int X, int Y, Texture Tex )
    {
        int ID = ( X + Y * Tex.Width ) * 3;
        unsafe
        {
            byte R = Tex.Data[ ID ];
            byte G = Tex.Data[ ID + 1 ];
            byte B = Tex.Data[ ID + 2 ];

            return CreateColor( ( float )R / 255f, ( float )G / 255f, ( float )B / 255f );
        }
    }

I get the following error:

An unhandled exception of type 'Cloo.InvalidCommandQueueComputeException' occurred in Cudafy.NET.dll
Additional information: OpenCL error code detected: InvalidCommandQueue.

The Texture struct looks like this, by the way:

    [Cudafy]
    public unsafe struct Texture
    {
        public int Width, Height;
        public byte* Data;
    }

I'm completely at a loss again..

I agree with your assessment that it is related to `byte[ ] Data` since it is variable size and likely storage is external to the `Texture` struct itself. OpenCL 1.x can't send host pointers to the device. You could either have fixed-size Texture objects, or have a set of them (e.g, small, medium, large, perhaps growing by 2x each level) and select the best-fit one for each texture. Or use low-level OpenCL calls and allocate exact-fit objects as native OpenCL buffers or images. — Dithermaster, Jun 22 '14 at 22:28
I think I'd rather use the low-level solution as this will save the most memory. Have you perhaps got an example on how to do this? Or some search terms which will most likely lead me to the right answers? — WolfCode, Jun 22 '14 at 22:55
Not offhand; just Google `clCreateBuffer` and you should find some examples. That API can both create and upload a buffer, or you can just create it and use clEnqueueWriteBuffer to upload it. — Dithermaster, Jun 24 '14 at 00:14
I've updated the original post, could you take a look if this is the right way? I'm now getting another error, but the copying to the GPU isn't a problem anymore. — WolfCode, Jun 24 '14 at 19:32
Sorry, I don't know CUDAfy so most of that is Greek to me. However, the whole `Pointer` and `ToPointer` stuff on the `GetDeviceMemory` object seems odd to me since you don't use pointers to communicate about device memory in OpenCL 1.x (you use buffer handles). The error you are getting is `InvalidCommandQueue` so I'd bark up that tree first -- who is in charge of command queues and why is it invalid? You may need to learn some OpenCL API and step into the CUDAfy codebase to solve this, or find someone who knows CUDAfy. — Dithermaster, Jun 24 '14 at 22:21

Miloš Selečéni · Answer 1 · 2014-06-30T22:23:24.677

Cudafy does not support arrays yet. So you can't use "public byte[] Data" neither in structures nor kernels itself. you could try it less object oriented. I mean try to remove data array from structre itself and copy them separately. e.g. copyToDevice("texture properties") and then copy appropriate data array copyToDevice("texture data")

EDIT: OK I found a solution but it is not pretty code.

As you get the Pointer of your data stored in GPU mem. cast him in to integer value pointer.ToInt64(); and store this value in your Structure object simply as long value(not long pointer). than you can use the GThread.InsertCode() method to insert directly code into your kernel without compiling. You can not use pointer directly in your kernel code becase they are not blittable data type. So stop talking here is the example of my working code

class Program
{
    [Cudafy]
    public struct TestStruct
    {
        public double value;
        public long dataPointer; // your data pointer adress
    }

    [Cudafy]
    public static void kernelTest(GThread thread, TestStruct[] structure, int[] intArray)
    {
        // Do something 
        GThread.InsertCode("int* pointer = (int*)structure[0].dataPointer;");
        GThread.InsertCode("structure[0].value = pointer[1];");             // Here you can acces your data using pointer pointer[0], pointer[1] and so on
    }


    private unsafe static void Main(string[] args)
    {

            GPGPU gpuCuda = CudafyHost.GetDevice(eGPUType.Cuda, 0);
            CudafyModule km = CudafyTranslator.Cudafy();
            gpuCuda.LoadModule(km);

            TestStruct[] host_array = new TestStruct[1];
            host_array[0] = new TestStruct();

            int[] host_intArray = new[] {1, 8, 3};
            int[] dev_intArray = gpuCuda.CopyToDevice(host_intArray);

            DevicePtrEx p = gpuCuda.GetDeviceMemory(dev_intArray);
            IntPtr pointer = p.Pointer;

            host_array[0].dataPointer = pointer.ToInt64();


            TestStruct[] dev_array = gpuCuda.Allocate(host_array);
            gpuCuda.CopyToDevice(host_array, dev_array);

            gpuCuda.Launch().kernelTest(dev_array, dev_intArray);

            gpuCuda.CopyFromDevice(dev_array, host_array);

            Console.WriteLine(host_array[0].value);

            Console.ReadKey();
    }
}

The "magic" is in InsertCode() where you cast your long dataPointer value as int pointer adress... but the disadvantage of this approache is that you must write those parts of code as String.

OR you can separate your data and structures e.g.

[Cudafy]
public struct Texture
{
    public int Width, Height;
}

[Cudafy]
    public static void kernelTest(GThread thread, Texture[] TexStructure, byte[] Data)
    {....}

And simply copy

dev_Data = gpu.CopyToDevice(host_Data);
dev_Texture = gpu.CopyToDevice(host_Texture);
gpu.Launch().kernelTest(dev_Texture, dev_Data);

EDIT TWO: forget about my code :D

Check this https://cudafy.codeplex.com/discussions/538310 and THIS is solution for your problem https://cudafy.codeplex.com/discussions/283527

Could you perhaps explain this idea in more detail? I find it quite difficult to understand how I would go about sending data to the GPU without the arrays, somehow saving it there ( mostly this part which is causing me trouble ), and then sending the array seperately to add it to the first data on the gpu. — WolfCode, Jun 23 '14 at 22:46

CUDAfy.Net / OpenCL, struct containing byte array results in non-blittable exception

1 Answers1