Questions tagged [cuda]

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model for NVIDIA GPUs (Graphics Processing Units). CUDA provides an interface to NVIDIA GPUs through a variety of programming languages, libraries, and APIs.

CUDA is Nvidia's parallel computing platform and programming model for GPUs (Graphics Processing Units). CUDA provides an interface to Nvidia GPUs through a variety of programming languages, libraries, and APIs. Before posting CUDA questions, please read "How to get useful answers to your CUDA questions" below.

CUDA has an online documentation repository, updated with each release, including references for APIs and libraries; user guides for applications; and a detailed CUDA C/C++ Programming Guide.

The CUDA platform enables application development using several languages and associated APIs, including:

There also exist third-party bindings for using CUDA in other languages and programming environments, such as Managed CUDA for .NET languages (including C#).

You should ask questions about CUDA here on Stack Overflow, but if you have bugs to report you should discuss them on the CUDA forums or report them via the registered developer portal. You may want to cross-link to any discussion here on SO.

The CUDA execution model is not multithreading in the usual sense, so please do not tag CUDA questions with multithreading unless your question involves thread safety of the CUDA APIs, or the use of both normal CPU multithreading and CUDA together.

How to get useful answers to your CUDA questions

Here are a number of suggestions to users new to CUDA. Follow these suggestions before asking your question and you are much more likely to get a satisfactory answer!

Always check the result codes returned by CUDA API functions to ensure you are getting cudaSuccess. If you are not, and you don't know why, include the information about the error in your question. This includes checking for errors caused by the most recent kernel launch, which may not be available before you've called cudaDeviceSynchronize() or cudaStreamSynchronize(). More on checking for errors in CUDA in this question.
If you are getting unspecified launch failure it is possible that your code is causing a segmentation fault, meaning the code is accessing memory that is not allocated for the code to use. Try to verify that the indexing is correct and check if the CUDA Compute Sanitizer (or legacy cuda-memcheck on older GPUs until CUDA 12) is reporting any errors. Note that both tools encompass more than the default Memcheck. Other tools (Racecheck, Initcheck, Synccheck) must be selected explicitly.
The debugger for CUDA, cuda-gdb, is also very useful when you are not really sure what you are doing. You can monitor resources by warp, thread, block, SM and grid level. You can follow your program's execution. If a segmentation fault occurs in your program, cuda-gdb can help you find where the crash occurred and see what the context is. If you prefer a GUI for debugging, there are IDE plugins/editions for/of Visual Studio (Windows), Visual Studio Code (Windows/Mac/Linux, but GPU for debugging must be on a Linux system) and Eclipse (Linux).
If you are finding that you are getting syntax errors on CUDA keywords when compiling device code, make sure you are compiling using nvcc (or clang with CUDA support enabled) and that your source file has the expected .cu extension. If you find that CUDA device functions or feature namespaces you expect to work are not found (atomic functions, warp voting functions, half-precision arithmetic, cooperative groups, etc.), ensure that you are explicitly passing compilation arguments which enable architecture settings which support those features.

Books

14278 questions

votes

1 answer

removing elements from an device_vector

thrust::device_vector values thrust::device_vector keys; After initialization, keys contains some elements equal to -1. I wanted to delete the elements in keys and in the same position of values. But I do not know how to deal with it parallel?

cuda thrust

asked Jul 11 '13 at 12:29

GaoYuan

votes

1 answer

setting up a CUDA 2D "unsigned char" texture for linear interpolation

I have a linear array of unsigned chars representing a 2D array. I would like to place it into a CUDA 2D texture and perform (floating point) linear interpolation on it, i.e., have the texture call fetch the 4 nearest unsigned char neighbors,…

cuda textures interpolation

asked Jun 12 '13 at 21:24

Jammy

votes

1 answer

64 bit number support in CUDA

I kind of found various opinions on this topic, so this is why I decided to ask here. My question is starting from what computing capability is int64_t supported on CUDA. I am running cuda 5 on a Quadro770M and the following code works without a…

cuda 64-bit nvidia

asked May 30 '13 at 20:49

Zahari

votes

1 answer

Surface reference faster than Surface object

I recently changed the surface reference of my algorithm for a surface object. Then, I noticed that the program runs slower. Here is a comparison for simple example where I fill a 3D floating array [400*400*400] with a constant value. Surface…

cuda

asked May 27 '13 at 07:39

Arnaud

votes

1 answer

Strange behavior when detecting global memory

After reading this question: "How to differentiate between pointers to shared and global memory?", I decided to try isspacep.local, isspacep.global and isspacep.shared in a simple test program. The tests for local and shared memory work all the…

cuda

asked May 22 '13 at 05:38

BenC

8,729
3
49
68

votes

1 answer

How to differentiate between pointers to shared and global memory?

In CUDA, given the value of a pointer, or the address of a variable, is there an intrinsic or another API which will introspect which address space the pointer refers to?

cuda gpu-shared-memory

asked May 21 '13 at 19:05

Jared Hoberock

11,118
3
40
76

votes

2 answers

cannot find -lcuda when linking with g++

I'm trying to link these object files with the command: g++ NT_FFT_Decomp.o T_FFT_Decomp.o SNT_FFT_Comp.o ST_FFT_Comp.o VNT_FFT_Comp.o VT_FFT_Comp.o CUDA_FFT_Comp.o Globals.o main.o \ -L/media/wiso/Programs/Setups/CUDA/include -lcuda -lcudart…

c gcc cuda linker g++

asked May 13 '13 at 22:44

mewais

1,265
3
25
42

votes

1 answer

Invalid device symbol when copying to CUDA constant memory

I have several files for an app in image processing. As the number of rows and colums for an image does not change while doing some image processing algorithm I was trying to put those values in constant memory. My app looks…

cuda nvidia

asked May 11 '13 at 15:29

BRabbit27

6,333
17
90
161

votes

1 answer

CUDA pow function with integer arguments

I'm new in CUDA, and cannot understand what I'm doing wrong. I'm trying to calculate the distance of object it has id in array, axis x in array and axis y in array to find neighbors for each object __global__ void dist(int *id_d, int *x_d, int…

cuda gpgpu

asked May 05 '13 at 06:53

Alamin

votes

3 answers

Remote debugging and profiling of CUDA program running on Linux server

This is my scenario. I program my CUDA application on windows machine. I compile and run this application on remote linux (Debian) server (without graphical output) using putty. I want to ask what is the best way to debug and profile my…

debugging cuda profiling nsight

asked Apr 26 '13 at 12:32

stuhlo

1,479
9
17

votes

1 answer

Strange result of SURF_GPU and BruteForceMatcher_GPU with knnMatch

OpenCV 2.4.5, CUDA 5.0 I tried to transfer my SURF matcher from the CPU to the GPU and got such a strange result. I use knnMatch and findHomography + perspectiveTransform together with my function, which checks the corners of the bounding box for…

opencv cuda gpu surf

asked Apr 23 '13 at 10:47

iGriffer

votes

2 answers

The behavior of __CUDA_ARCH__ macro

In the host code, it seems that the __CUDA_ARCH__ macro wont generate different code path, instead, it will generate code for exact the code path for the current device. However, if __CUDA_ARCH__ were within device code, it will generate different…

cuda gpu nvidia

asked Apr 18 '13 at 00:53

user0002128

2,785
2
23
40

votes

1 answer

XCode and CUDA integration

Was just wondering if anyone has any experience working with CUDA and XCode? I'm having a nightmare setting it all up... Dawson

xcode macos cuda

asked Oct 18 '09 at 23:40

Ljdawson

12,091
11
45
60

votes

2 answers

How to display pixel arrays in GPU global memory onto screen directly?

I'm doing a path tracer on GPU, and I got some traced results of pixel data (which is an array of float3) on GPU global memory, what I do to display the array on screen is to copy the array to CPU memory and call OpenGL glTexImage2D: glTexImage2D…

opengl cuda gpu

asked Mar 27 '13 at 06:22

Tony

votes

1 answer

CUDA Kernels Randomly Fail, but only when I use certain transcendental functions

I've been working on a CUDA program, that randomly crashes with a unspecified launch failure, fairly frequently. Through careful debugging, I localized which kernel was failing, and furthermore that the failure occurred only if certain…

cuda

asked Mar 24 '13 at 00:01

njohn5188

Prev 1 2 3

…

99 100 Next