0

I have exactly the same problem as described in the post: CUDA Error on cudaBindTexture2D

I even have the following error:

error 18: invalid texture reference." and also experienced "wouldn't throw the error on cudaMalloc, but only on cudaBindTexture

Unfortunately, the poster (Anton Roth) answered his own question in a manner that was a bit too cryptic for someone such as myself who is just starting out with CUDA:

The answer was in the comments, I used a sm that my GPU wasn't compatible to.

The "not compatible with GPU" makes sense since the sample program FluidsGL (called "Fluids (OpenGL Version)" in NVIDIA CUDA Samples Browser) fails on my laptop, but works fine on my desktop at work. Unfortunately, I still don't know what "in the comments" was referring it, or how to even check for GPU SM compatibilities.

Here is the code that seems to be causing the issue:

#define DIM 512

In main:

setupTexture(DIM, DIM);
bindTexture();

In fluidsGL_kernels.cu:

texture<float2, 2> texref;
static cudaArray *array = NULL;

void setupTexture(int x, int y)
{
    // Wrap mode appears to be the new default
    texref.filterMode = cudaFilterModeLinear;
    cudaChannelFormatDesc desc = cudaCreateChannelDesc<float2>();

    cudaMallocArray(&array, &desc, y, x);
    getLastCudaError("cudaMalloc failed");
}

void bindTexture(void)
{
    cudaBindTextureToArray(texref, array);//this function itself doesn't throw the error but error 18 is caught by the function below
    getLastCudaError("cudaBindTexture failed");
}

Hardware information

Here is the output of deviceQuery:

Device 0: "GeForce 9800M GS"
  CUDA Driver Version / Runtime Version          5.0 / 5.0
  CUDA Capability Major/Minor version number:    1.1
  Total amount of global memory:                 1024 MBytes (1073741824 bytes)
  ( 8) Multiprocessors x (  8) CUDA Cores/MP:    64 CUDA Cores
  GPU Clock rate:                                1325 MHz (1.32 GHz)
  Memory Clock rate:                             799 Mhz
  Memory Bus Width:                              256-bit
  Max Texture Dimension Size (x,y,z)             1D=(8192), 2D=(65536,32768), 3D
=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers        1D=(8192) x 512, 2D=(8192,8192)
 x 512
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 8192
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  768
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Mo
del)
  Device supports Unified Addressing (UVA):      No
  Device PCI Bus ID / PCI location ID:           8 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simu
ltaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Versi
on = 5.0, NumDevs = 1, Device0 = GeForce 9800M GS

I know my GPU is kind of old, but it still runs most of the examples pretty well.

Community
  • 1
  • 1
Dustin Soodak
  • 553
  • 5
  • 15
  • You have a CC 1.1 GPU (cf. this line `CUDA Capability Major/Minor version number`). The main question is: how are you compiling your code? Could you paste the command that you use? Are you using `-gencode arch=compute_11,code=sm_11`? Also, [fluidsGL](http://docs.nvidia.com/cuda/cuda-samples/index.html#fluids--opengl-version-) should run fine on your hardware since the minimum SM version is 1.0 apparently. – BenC Jun 12 '13 at 05:40
  • It actually compiles fine(I'm using the Microsoft Visual Studio 2010 sample project without modification). The precompiled executable gave the same error as when I compile & debug on the laptop, but runs fine on the desktop. – Dustin Soodak Jun 13 '13 at 04:43
  • Something similar appears in slightly different form twice in a row: -gencode=arch=compute_20,code=\"sm_20,compute_20\" -gencode=arch=compute_30,code=\"sm_30,compute_30\" – Dustin Soodak Jun 13 '13 at 04:50
  • You will not see any error during compilation, only during execution. If you read nvcc's help (`nvcc --help`), you will see that `-code` is used to *specify the names of nVidia gpus to generate code for*, and `-arch` is used to *specify the name of the class of nVidia GPU architectures for which the cuda input files must be compiled*. If you do not set a proper architecture, errors may happen when running the code on an inadequate GPU. If you only compile for `compute/sm_20` and `compute/sm_30`, your code will fail on your GPU (hence my first comment). – BenC Jun 13 '13 at 05:07
  • Shortened: "...\5_Simulations\fluidsGLfromwork>"...\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\" -gencode=arch=compute_30,code=\"sm_30,compute_30\" --use-local-env --cl-version 2010 -ccbin "...Visual Studio 10.0\VC\bin" -I"./" -I"../../common/inc" -I"./" -I"../../common/inc" -I"...\CUDA\v5.0\include" -I"...\CUDA\v5.0\include" -G --keep-dir "Debug" -maxrregcount=0 --machine 32 --compile -g -DWIN32 -DWIN32 -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MTd " -o "Win32/Debug/fluidsGL_kernels.cu.obj" "...\fluidsGL_kernels.cu" – Dustin Soodak Jun 13 '13 at 05:25
  • You should consider using [pastebin](http://pastebin.com/) for that kind of copy/paste on Stack Overflow. As you can see, you are generating code for CC 2.0/3.0 devices, while using a CC 1.1 device. – BenC Jun 13 '13 at 05:29
  • It worked! In visual studio (btw, you need VS professional or it doesn't work at all) I went to configuration properties->CUDA C/C++ -> Device -> Code Generation, and added "compute_11,code=sm_11". Then did the same after r-clicking on fluidsGL_kernels.cu. – Dustin Soodak Jun 13 '13 at 06:16

1 Answers1

1

You need to compile your code for the proper architecture (as explained in the post you linked).

Since you have a CC 1.1 device, use the following nvcc compilation options:

-gencode arch=compute_11,code=sm_11

The default Visual Studio project or Makefile may not compile for the proper architectures, so always make sure that it does.

For Visual Studio, refer to this answer: https://stackoverflow.com/a/14413360/1043187

For a Makefile, it depends. The CUDA SDK samples often have a GENCODE_FLAGS variable that you can modify.

Community
  • 1
  • 1
BenC
  • 8,729
  • 3
  • 49
  • 68