Questions tagged [gpgpu]

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)"

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)". The two biggest manufacturers of GPUs are NVIDIA and AMD, although Intel has recently been moving in this direction with the Haswell APUs . There are two popular frameworks for GPGPU - NVidia's CUDA, which is only supported on its own hardware, and OpenCL developed by the Khronos Group. The latter is a consortium including all of AMD, NVidia, Intel, Apple and others, but the OpenCL standard is only half-heartedly supported by NVidia - creating a partial reflection of the rivalry among GPU manufacturers in the rivalry of programming frameworks.

The attractiveness of using GPUs for other tasks largely stems from the parallel processing capabilities of many modern graphics cards. Some cards can have thousands of streams processing similar data at incredible rates.

In the past, CPUs first emulated threading/multiple data streams through interpolation of processing tasks. Over time, we gained multiple cores with multiple threads. Now video cards house a number of GPUs, hosting many more threads or streams than many CPUs, and extremely fast memory integrated together. This huge increase of threads in execution is achieved thanks to the technique SIMD which stands for Single Instruction Multiple Data. This makes an environment uniquely suited for heavy computational loads that are able to undergo parallelization. Furthermore this technique also marks one of main differences between GPUs and CPUs as they are doing best what they were designed for.

More information at http://en.wikipedia.org/wiki/GPGPU

2243 questions
25
votes
7 answers

OpenCL - How to I query for a device's SIMD width?

In CUDA, there is a concept of a warp, which is defined as the maximum number of threads that can execute the same instruction simultaneously within a single processing element. For NVIDIA, this warp size is 32 for all of their cards currently on…
Jonathan DeCarlo
  • 2,798
  • 1
  • 20
  • 24
24
votes
4 answers

What is the point of GLSL when there is OpenCL?

Consider this the complete form of the question in the title: Since OpenCL may be the common standard for serious GPU programming in the future (among other devices programming), why not when programming for OpenGL - in a future-proof way - utilize…
j riv
  • 3,593
  • 6
  • 39
  • 54
24
votes
2 answers

What is coherent memory on GPU?

I have stumbled not once into a term "non coherent" and "coherent" memory in the tech papers related to graphics programming.I have been searching for a simple and clear explanation,but found mostly 'hardcore' papers of this type.I would be glad to…
Michael IV
  • 11,016
  • 12
  • 92
  • 223
24
votes
2 answers

Clarification of the leading dimension in CUBLAS when transposing

For a matrix A, the documentation only states that the corresponding leading dimension parameter lda refers to the: leading dimension of two-dimensional array used to store the matrix A Thus I presume this is just the number of rows of A given…
mchen
  • 9,808
  • 17
  • 72
  • 125
24
votes
2 answers

How to measure the inner kernel time in NVIDIA CUDA?

I want to measure time inner kernel of GPU, how how to measure it in NVIDIA CUDA? e.g. __global__ void kernelSample() { some code here get start time some code here get stop time some code here }
Amin
  • 371
  • 1
  • 2
  • 7
23
votes
1 answer

What is the context switching mechanism in GPU?

As I know, GPUs switch between warps to hide the memory latency. But I wonder in which condition, a warp will be switched out? For example, if a warp perform a load, and the data is there in the cache already. So is the warp switched out or continue…
Zk1001
  • 2,033
  • 4
  • 19
  • 36
23
votes
5 answers

How to create or manipulate GPU assembler?

Does any one have experience in creating/manipulating GPU machine code, possibly at run-time? I am interested in modifying GPU assembler code, possibly at run time with minimal overhead. Specifically I'm interested in assembler based genetic…
zenna
  • 9,006
  • 12
  • 73
  • 101
23
votes
5 answers

How to dynamically allocate arrays inside a kernel?

I need to dynamically allocate some arrays inside the kernel function. How can a I do that? My code is something like that: __global__ func(float *grid_d,int n, int nn){ int i,j; float x[n],y[nn]; //Do some really cool and heavy…
Granada
  • 363
  • 1
  • 3
  • 7
21
votes
4 answers

CUDA Block and Grid size efficiencies

What is the advised way of dealing with dynamically-sized datasets in cuda? Is it a case of 'set the block and grid sizes based on the problem set' or is it worthwhile to assign block dimensions as factors of 2 and have some in-kernel logic to deal…
Bolster
  • 7,460
  • 13
  • 61
  • 96
20
votes
1 answer

cpu vs gpu - when cpu is better

I know many examples when GPU is much faster than CPU. But exists algorithms (problems) which are very hard to parallelise. Could you give me some examples or tests when CPU can overcome GPU ? Edit: Thanks for suggestions! We can make a comparison…
tynk
  • 211
  • 2
  • 5
20
votes
1 answer

How to use pinned memory / mapped memory in OpenCL

In order to reduce the transfer time from host to device for my application, I want to use pinned memory. NVIDIA's best practices guide proposes mapping buffers and writing the data using the following code: cDataIn = (unsigned…
krisg
  • 271
  • 2
  • 11
20
votes
2 answers

When to use volatile with shared CUDA Memory

Under what circumstances should you use the volatile keyword with a CUDA kernel's shared memory? I understand that volatile tells the compiler never to cache any values, but my question is about the behavior with a shared array: __shared__ float…
Taj Morton
  • 1,588
  • 4
  • 18
  • 26
19
votes
2 answers

How are 2D / 3D CUDA blocks divided into warps?

If I start my kernel with a grid whose blocks have dimensions: dim3 block_dims(16,16); How are the grid blocks now split into warps? Do the first two rows of such a block form one warp, or the first two columns, or is this arbitrarily-ordered?…
Gabriel
  • 8,990
  • 6
  • 57
  • 101
19
votes
1 answer

How many threads (or work-item) can run at the same time?

I'm new in GPGPU programming and I'm working with NVIDIA implementation of OpenCL. My question was how to compute the limit of a GPU device (in number of threads). From what I understood a there are a number of work-group (equivalent of blocks in…
Laure Jonchery
  • 266
  • 1
  • 3
  • 5
19
votes
10 answers

Have you successfully used a GPGPU?

I am interested to know whether anyone has written an application that takes advantage of a GPGPU by using, for example, nVidia CUDA. If so, what issues did you find and what performance gains did you achieve compared with a standard CPU?
John Channing
  • 6,501
  • 7
  • 45
  • 56