0

I am trying to write DLL function to allocate cuda memory and get back pointer to cuda (device) memory.

Second function should accept this pointer and do the calculation.

I want this operation to be separate because I need to do many calculations on the same data and I am trying to avoid repeatedly copying same data to GPU memory (it takeS a lot of time)-

Q: what do I need to add to my DLL to be able to export pointer to i_d ?

My DLL:

main.cpp:

  extern "C" __declspec(dllexport) int cuda_Malloc ( float *i, void **i_d, int N ){
     for( float x=0; x<N; x++ )
        i[x]=x;
     kernel_cuda_Malloc( i, i_d, N );
     return 0;
  }

  extern "C" __declspec(dllexport) int cuda_Calculation( void *i_d, float *result, int N ) {
     kernel_cuda_calculation( i_d, result, N );
     return 0;
  }

simple.cu:

  __global__ void kernelTest( float *i, int N ){
    unsigned int tid = blockIdx.x*blockDim.x + threadIdx.x;
    if ( tid<N )
       i[tid] += 10;
  }

  int kernel_cuda_Malloc( float *i, void **i_d, int N ){
     cudaMalloc( (void**)&i_d, N*sizeof( float ) );
     cudaMemcpy( i_d, i, N*sizeof( float ), cudaMemcpyHostToDevice );
     return 0;   
  }


  void kernel_cuda_calculation( float *i_d, float *result, int N ){
     dim3 threads; threads.x = 240;
     dim3 blocks; blocks.x = ( N/threads.x ) + 1;
     kernelTest<<< threads, blocks >>>( i_d, N );
     cudaMemcpy( result, i_d, N*sizeof( float ), cudaMemcpyDeviceToHost );
     cudaFree( i_d );

}

I am not able to get out pointer to i_d from cuda_Malloc function in LabVIEW.

Code is modification of https://decibel.ni.com/content/docs/DOC-20353

CharlesB
  • 86,532
  • 28
  • 194
  • 218
user1281071
  • 875
  • 2
  • 13
  • 23

1 Answers1

1

All CUDA functions are executed from within a CUDA context. To be able to transfer the pointer between functions, the context must also be preserved.

Your code does not make much sense. Both functions in the DLL are called cuda_Malloc. None of the functions actually return anything. Example code is good, but only when you take the time to provide what you think should work.

Edit: Sorry, I missed the fact that you are attempting to return your pointer by modifying a pointer that was passed in as an argument. For that to work, you must pass in a pointer to the pointer, not just the pointer.

int kernel_cuda_Malloc( float *i, void *i_d, int N ){

should be

int kernel_cuda_Malloc( float *i, void **i_d, int N ){
Roger Dahl
  • 15,132
  • 8
  • 62
  • 82
  • I have edited my question. It is still returning same value of i_d as I send in cuda_Malloc. I use this DLL in labview. I send to DLL as i_d dummy value - zero, and I am expecting that after cuda_Malloc finish i_d would contain pointer to i_d but it is still zero. – user1281071 Apr 26 '12 at 12:02
  • 1
    In kernel_cuda_Malloc, i_d is a pointer to a pointer. In the cudaMalloc call, you use "&" to make a pointer to *that*. Remove the "&". That error is hidden because you have an unnecessary cast. Just remove it. On the next line, you send the pointer to a pointer to cudaMemcpy, but it only takes a pointer. You need to dereference it once with, "*". – Roger Dahl Apr 26 '12 at 14:04