5

I've these structs:

typedef struct neuron
{
float*  weights;
int n_weights;
}Neuron;


typedef struct neurallayer
{
Neuron *neurons;
int    n_neurons;
int    act_function;
}NLayer;

"NLayer" struct can contain an arbitrary number of "Neuron"

I've tried to allocate a 'NLayer' struct with 5 'Neurons' from the host in this way:

NLayer* nL;
int i;
int tmp=9;
cudaMalloc((void**)&nL,sizeof(NLayer));
cudaMalloc((void**)&nL->neurons,6*sizeof(Neuron));
for(i=0;i<5;i++)
    cudaMemcpy(&nL->neurons[i].n_weights,&tmp,sizeof(int),cudaMemcpyHostToDevice);

...then I've tried to modify the "nL->neurons[0].n_weights" variable with that kernel:

__global__ void test(NLayer* n)
           {
              n->neurons[0].n_weights=121;
           }

but at compiling time nvcc returns that "warning" related to the only line of the kernel:

Warning: Cannot tell what pointer points to, assuming global memory space

and when the kernel finish its work the struct begin unreachable.

It's very probably that I'm doing something wrong during the allocation....can someone helps me?? Thanks very much, and sorry for my english! :)

UPDATE:

Thanks to aland I've modified my code creating this function that should allocate an instance of the struct "NLayer":

NLayer* setNLayer(int numNeurons,int weightsPerNeuron,int act_fun)
{
    int i;
    NLayer  h_layer;
    NLayer* d_layer;
    float*  d_weights;

    //SET THE LAYER VARIABLE OF THE HOST NLAYER
    h_layer.act_function=act_fun;
    h_layer.n_neurons=numNeurons;
    //ALLOCATING THE DEVICE NLAYER
    if(cudaMalloc((void**)&d_layer,sizeof(NLayer))!=cudaSuccess)
        puts("ERROR: Unable to allocate the Layer");
    //ALLOCATING THE NEURONS ON THE DEVICE
    if(cudaMalloc((void**)&h_layer.neurons,numNeurons*sizeof(Neuron))!=cudaSuccess)
        puts("ERROR: Unable to allocate the Neurons of the Layer");
    //COPING THE HOST NLAYER ON THE DEVICE
    if(cudaMemcpy(d_layer,&h_layer,sizeof(NLayer),cudaMemcpyHostToDevice)!=cudaSuccess)
                puts("ERROR: Unable to copy the data layer onto the device");

    for(i=0;i<numNeurons;i++)
    {
        //ALLOCATING THE WEIGHTS' ARRAY ON THE DEVICE
        cudaMalloc((void**)&d_weights,weightsPerNeuron*sizeof(float));
        //COPING ITS POINTER AS PART OF THE i-TH NEURONS STRUCT
        if(cudaMemcpy(&d_layer->neurons[i].weights,&d_weights,sizeof(float*),cudaMemcpyHostToDevice)!=cudaSuccess)
                puts("Error: unable to copy weights' pointer to the device");
    }


    //RETURN THE DEVICE POINTER
    return d_layer;
}

and i call that function from the main in that way (the kernel "test" is previously declared):

int main()
{
    NLayer* nL;
    int h_tmp1;
    float h_tmp2;

    nL=setNLayer(10,12,13);
    test<<<1,1>>>(nL);
    if(cudaMemcpy(&h_tmp1,&nL->neurons[0].n_weights,sizeof(float),cudaMemcpyDeviceToHost)!=cudaSuccess);
        puts("ERROR!!");
    printf("RESULT:%d",h_tmp1);

}

When I compile that code the compiler show me the Warning, and when I execute the program it print on screen:

Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
ERROR!!
RESULT:1

The last error doesn't not compare if I comment the kernel call.

Where I'm wrong? I do not know how to do Thanks for your help!

Andrea Sylar Solla
  • 157
  • 1
  • 2
  • 10

2 Answers2

6

The problem is here:

cudaMalloc((void**)&nL,sizeof(NLayer));
cudaMalloc((void**)&nL->neurons,6*sizeof(Neuron));

In first line, nL is pointing to structure in global memory on device. Therefore, in second line the first argument to cudaMalloc is address residing on GPU, which is undefined behaviour (on my test system, it causes segfault; in your case, though, there is something more subtle).

The correct way to do what you want is first to create structure in host memory, fill it with data, and then copy it to device, like this:

NLayer* nL;
NLayer h_nL;
int i;
int tmp=9;
// Allocate data on device
cudaMalloc((void**)&nL, sizeof(NLayer));
cudaMalloc((void**)&h_nL.neurons, 6*sizeof(Neuron));
// Copy nlayer with pointers to device
cudaMemcpy(nL, &h_nL, sizeof(NLayer), cudaMemcpyHostToDevice);

Also, don't forget to always check for any errors from CUDA routines.

UPDATE

In second version of your code:

cudaMemcpy(&d_layer->neurons[i].weights,&d_weights,...) --- again, you are dereferencing device pointer (d_layer) on host. Instead, you should use

cudaMemcpy(&h_layer.neurons[i].weights,&d_weights,sizeof(float*),cudaMemcpyHostToDevice

Here you take h_layer (host structure), read its element (h_layer.neurons), which is pointer to device memory. Then you do some pointer arithmetics on it (&h_layer.neurons[i].weights). No access to device memory is needed to compute this address.

aland
  • 4,829
  • 2
  • 24
  • 42
  • I've modified my code, but it's doesn't work, could you take a look? The new code is in my initial post...Thanks! – Andrea Sylar Solla Aug 09 '12 at 12:08
  • Oh! Thank you it works!! I've only one more question: If I want to access from the host to the data contened into the integer variable **d_layer->neurons[0].n_weights** I must previously copy **d_layer** on the host, then I've to copy the **d_layer->neurons[0]** on the host, then, finally, I can take the "d_layer->neurons[0].n_weights variable?? I'm asking it just because I've tried to copy directly the "d_layer->neurons[0].n_weights with cudaMemcpy(...), but it return always an "invalid argument" error. – Andrea Sylar Solla Aug 09 '12 at 15:08
  • @AndreaSylarSolla You could simply use `int t; cudaMemcpy(&t, &h_layer.neurons[0].n_weights,....)` or `Neuron t; cudaMemcpy(&t, &h_layer.neurons[0],....)`. There is no need to copy `d_layer`, since you only need the value of `neurons` pointer from it, but the very same value is alredy in `h_layer`. – aland Aug 09 '12 at 15:24
  • @AndreaSylarSolla I'd advice you to spend some time and try to improve your understanding of pointers in case of GPGPU programming (since there are two distinct memory spaces it's a little trickier than in CPU-only code). The important part in your question is that while `cuda*` functions can internally operate with memory on GPU, their arguments are computed entirely on CPU, and CPU can not directly access any values stored on GPU (but if it has pointer to device memory, it can compute offset, so you can use `&h_layer.neurons[i]` in your host code, but not `h_layer.neurons[i]`) – aland Aug 09 '12 at 15:37
  • Yes, I've sensed it during my testings. Thanks so much for your helpful advices! – Andrea Sylar Solla Aug 09 '12 at 16:05
0

It all depends on the GPU card your using. The Fermi card uses uniform addressing of shared and global memory space, while pre-Fermi cards don't.

For the pre-Fermi case, you don't know if the address should be shared or global. The compiler can usually figure this out, but there are cases where it can't. When a pointer to shared memory is required, you usually take an address of a shared variable and the compiler can recognise this. The message "assuming global" will appear when this is not explicitly defined.

If you are using a GPU that has compute capabiilty of 2.x or higher, it should work with the -arch=sm_20 compiler flag

Beau Bellamy
  • 461
  • 1
  • 8
  • 19
  • While you are right about the warning, I doubt it is what causes abnormal behaviour of program. After all, the compiler's assumption about structure residing in global emory space is correct... – aland Aug 09 '12 at 01:52
  • I'm using an NVIDIA GeForce 320M 256 MB with 1.2 capability, so I don't think that it's a "Fermi" card – Andrea Sylar Solla Aug 09 '12 at 09:29