I have a square grid (ni x nj). In this case, ni=256 and nj=256. This is a part of my code which I have a doubt about.
u = (float *)malloc(ni*nj*sizeof(float));
cudaMallocPitch((void **)&u_data, &pitch, sizeof(float)*ni, nj);
cudaMemset2D((void *)u_data, pitch, 0.2, sizeof(float)*ni, nj);
cudaMemcpy2D((void *)u, pitch, (void *)u_data,
sizeof(float)*ni, sizeof(float)*ni,
nj, cudaMemcpyDeviceToHost);
printf("%f, %f\n", u[I2D(ni, 23, 54)], u[I2D(ni, 45, 67)]);
From what I understand, the expected output should be : 0.2, 0.2
But it shows : 0.0, 0.0
Can somebody explain to me what the problem could be? What am I doing wrong?