0

Non-normalized linear interpolation from a CUDA texture object bound to a CUDA array appears to be returning incorrect results. It appears that the interpolated values are a factor of 0.5 smaller than expected. Normalized linear interpolation appears to be working properly.

Is there something wrong in this code? Are we expected to multiply by 2 when doing non-normalized texture interpolation?

The code:

#include <iostream>
#include <cstdio>

// simple function to print an array
template <typename T>
void print_array(const T *a, const size_t length) {
  for (size_t i=0; i!=length; i++) {
    std::cout << "a[" << i << "]: " << a[i] << std::endl;
  }
}

// attempt to interpolate linear memory
__global__
void cuda_texture_interpolate(cudaTextureObject_t tex,
                              float start,
                              float stop,
                              int count) {
  if (count < 1) { count = 1; }
  float h = (stop-start)/((float)count);
  float x = start;
  float y;
  for (int i = 0; i != count; i++) {
    y = tex1D<float>(tex,x);
    printf("x: %4g ; y: %4g\n",x,y);
    x = x + h;
  }
  y = tex1D<float>(tex,x);
  printf("x: %4g ; y: %4g\n",x,y);
}

int main(void) {
  // set up host array
  int n = 5;
  float a_host[5] = {3,2,1,2,3};
  printf("printing array on host.\n");
  print_array(a_host,n);

  // allocate and copy to cuda array
  cudaChannelFormatDesc channelDesc =
      cudaCreateChannelDesc(32, 0, 0, 0,
                            cudaChannelFormatKindFloat);
  cudaArray* cuArray;
  cudaMallocArray(&cuArray, &channelDesc, n);

  // Copy to device memory some data located at address h_data
  // in host memory
  cudaMemcpyToArray(cuArray, 0, 0, a_host, n*sizeof(float),
                    cudaMemcpyHostToDevice);

  // create texture object
  cudaResourceDesc resDesc;
  memset(&resDesc, 0, sizeof(resDesc));
  resDesc.resType = cudaResourceTypeArray;
  resDesc.res.array.array = cuArray;

  cudaTextureDesc texDesc;
  memset(&texDesc, 0, sizeof(texDesc));
  texDesc.addressMode[0]   = cudaAddressModeClamp;
  texDesc.filterMode       = cudaFilterModeLinear;
  texDesc.readMode         = cudaReadModeElementType;
  //texDesc.normalizedCoords = 1;
  texDesc.normalizedCoords = 0;


  cudaResourceViewDesc resViewDesc;
  memset(&resViewDesc, 0, sizeof(resViewDesc));
  resViewDesc.format = cudaResViewFormatFloat1;
  resViewDesc.width = n;

  // create texture object
  cudaTextureObject_t tex;
  cudaCreateTextureObject(&tex, &resDesc, &texDesc, &resViewDesc);

  // call interpolation kernel
  printf("interpolate (f(x) -> y).\n");
  //cuda_texture_interpolate<<<1,1>>>(tex,0.0,1.0,10);
  cuda_texture_interpolate<<<1,1>>>(tex,0.0,5.0,10);

  // clean up
  cudaDestroyTextureObject(tex);
  cudaFreeArray(cuArray);

  printf("end of texture_object_interpolation.\n");
  return 0;
}

The result:

$ ./texture_object_interpolation
printing array on host.
a[0]: 3
a[1]: 2
a[2]: 1
a[3]: 2
a[4]: 3
interpolate (f(x) -> y).
x:    0 ; y:  1.5
x:  0.5 ; y:  1.5
x:    1 ; y: 1.25
x:  1.5 ; y:    1
x:    2 ; y: 0.75
x:  2.5 ; y:  0.5
x:    3 ; y: 0.75
x:  3.5 ; y:    1
x:    4 ; y: 1.25
x:  4.5 ; y:  1.5
x:    5 ; y:  1.5
end of texture_object_interpolation.

Please see this gist for the above code, a makefile, and code for normalized interpolation.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
nwhsvc
  • 652
  • 1
  • 6
  • 15
  • 1
    To hit the center of each texel, you have to add 0.5 to each texture coordinate. That addition appears to be absent from the code. This may not be the only issue, but I haven't checked. – njuffa Jun 26 '13 at 01:59
  • Thanks @njuffa! The texel centers should be hit by the `x` coordinates at `0.5`, `1.5`, ... The `y` values come out correct if they are multiplied by 2. – nwhsvc Jun 26 '13 at 17:54
  • I've been in contact with an Nvidia engineer who suggested that I submit a bug report regarding this issue. I will comment about the results. – nwhsvc Jun 26 '13 at 18:10
  • NVIDIA has confirmed this as a bug in CUDA 5.0. They are investigating. – nwhsvc Jun 27 '13 at 18:14
  • 2
    Bug appears to have been resolved in CUDA 5.5 RC. – nwhsvc Jun 27 '13 at 20:31

1 Answers1

2

This was apparently caused by a bug in the CUDA 5.0 compiler, and fixed in the CUDA 5.5 release.

[This answer has been assembled from comments to get the question off the unanswered queue for the CUDA tag]

talonmies
  • 70,661
  • 34
  • 192
  • 269