Questions tagged [gpgpu]

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)"

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)". The two biggest manufacturers of GPUs are NVIDIA and AMD, although Intel has recently been moving in this direction with the Haswell APUs . There are two popular frameworks for GPGPU - NVidia's CUDA, which is only supported on its own hardware, and OpenCL developed by the Khronos Group. The latter is a consortium including all of AMD, NVidia, Intel, Apple and others, but the OpenCL standard is only half-heartedly supported by NVidia - creating a partial reflection of the rivalry among GPU manufacturers in the rivalry of programming frameworks.

The attractiveness of using GPUs for other tasks largely stems from the parallel processing capabilities of many modern graphics cards. Some cards can have thousands of streams processing similar data at incredible rates.

In the past, CPUs first emulated threading/multiple data streams through interpolation of processing tasks. Over time, we gained multiple cores with multiple threads. Now video cards house a number of GPUs, hosting many more threads or streams than many CPUs, and extremely fast memory integrated together. This huge increase of threads in execution is achieved thanks to the technique SIMD which stands for Single Instruction Multiple Data. This makes an environment uniquely suited for heavy computational loads that are able to undergo parallelization. Furthermore this technique also marks one of main differences between GPUs and CPUs as they are doing best what they were designed for.

More information at http://en.wikipedia.org/wiki/GPGPU

2243 questions

votes

2 answers

NVIDIA CUDA Video Encoder (NVCUVENC) input from device texture array

I am modifying CUDA Video Encoder (NVCUVENC) encoding sample found in SDK samples pack so that the data comes not from external yuv files (as is done in the sample ) but from cudaArray which is filled from texture. So the key API method that encodes…

c++ cuda gpgpu

asked Mar 04 '13 at 20:44

Michael IV

11,016
12
92
223

votes

1 answer

The variation of cache misses in GPU

I have been toying an OpenCL kernel that access 7 global memory buffers, do something on the values and store the result back to a 8th global memory buffer. As I observed, as the input size increases, the L1 cache miss ratio (=misses(misses + hits))…

opencl gpu gpgpu

asked Jul 19 '11 at 14:41

Zk1001

2,033
4
19
36

votes

2 answers

Error using Tensorflow with GPU

I've tried a bunch of different Tensorflow examples, which works fine on the CPU but generates the same error when I'm trying to run them on the GPU. One little example is this: import tensorflow as tf # Creates a graph. a = tf.constant([1.0, 2.0,…

gpgpu tensorflow

asked Dec 29 '15 at 15:45

user5654767

votes

2 answers

Continuous Integration Service for GPU package?

Continuous integration services are wonderful for continually testing updates to packages for various languages. These include services like Travis-CI, Jenkins, and Shippable among many others. However, as I have explored these different services…

continuous-integration gpgpu

asked May 01 '15 at 12:35

cdeterman

19,630
7
76
100

votes

4 answers

Double precision floating point in CUDA

Does CUDA support double precision floating point numbers? Also, what are the reasons for the same?

floating-point cuda gpu gpgpu

asked May 12 '10 at 08:11

cuda-dev

votes

3 answers

Why does CUDA code run so much faster in NVIDIA Visual Profiler?

A piece of code that takes well over 1 minute on the command line was done in a matter of seconds in NVIDIA Visual Profiler (running the same .exe). So the natural question is why? Is there something wrong with command line, or does Visual Profiler…

performance cuda gpgpu

asked May 15 '13 at 01:53

mchen

9,808
17
72
125

votes

2 answers

Numpy, BLAS and CUBLAS

Numpy can be "linked/compiled" against different BLAS implementations (MKL, ACML, ATLAS, GotoBlas, etc). That's not always straightforward to configure but it is possible. Is it also possible to "link/compile" numpy against NVIDIA's CUBLAS…

numpy cuda gpgpu blas

asked Jul 20 '12 at 08:54

Ümit

17,379
7
55
74

votes

5 answers

Is it worth offloading FFT computation to an embedded GPU?

We are considering porting an application from a dedicated digital signal processing chip to run on generic x86 hardware. The application does a lot of Fourier transforms, and from brief research, it appears that FFTs are fairly well suited to…

embedded fft gpu gpgpu

asked Nov 16 '11 at 21:02

Ian Renton

votes

4 answers

Any Lisp extensions for CUDA?

I just noted that one of the first languages for the Connection-Machine of W.D. Hillis was *Lisp, an extension of Common Lisp with parallel constructs. The Connection-Machine was a massively parallel computer with SIMD architecture, much the same as…

lisp cuda parallel-processing gpgpu simd

asked May 18 '11 at 15:18

Halberdier

1,164
11
15

votes

3 answers

Basic GPU application, integer calculations

Long story short, I have done several prototypes of interactive software. I use pygame now (python sdl wrapper) and everything is done on CPU. I am starting to port it to C now and at the same time search for the existing possibilities to use some…

c gpu gpgpu

asked May 20 '15 at 23:32

Mikhail V

1,416
1
14
23

votes

1 answer

forceinline effect at CUDA C device functions

There is a lot of advice on when to use inline functions and when to avoid it in regular C coding. What is the effect of __forceinline__ on CUDA C __device__ functions? Where should they be used and where be avoided?

c cuda gpgpu nvidia

asked Nov 11 '13 at 02:09

Farzad

3,288
2
29
53

votes

4 answers

printing from cuda kernels

I am writing a cuda program and trying to print something inside the cuda kernels using the printf function. But when I am compiling the program then I am getting an error error : calling a host function("printf") from a __device__/__global__…

c visual-studio-2010 cuda gpgpu

asked Dec 31 '12 at 22:45

duttasankha

votes

1 answer

Is there memory protection on GPUs

I don't have much experience with GPUs so please forgive my ignorance. Nowadays, GPUs are being used as GPGPUs for general purpose programming. But I was wondering if GPUs have memory protection and virtualization mechanism. I mean, for example, you…

c memory-management gpu gpgpu

asked May 02 '12 at 13:29

pythonic

20,589
43
136
219

votes

3 answers

In OpenCL, what does mem_fence() do, as opposed to barrier()?

Unlike barrier() (which I think I understand), mem_fence() does not affect all items in the work group. The OpenCL spec says (section 6.11.10), for mem_fence(): Orders loads and stores of a work-item executing a kernel. (so it applies to a single…

opencl gpgpu memory-barriers barrier memory-fences

asked Oct 06 '11 at 12:03

andrew cooke

45,717
10
93
143

votes

2 answers

Why does my OpenCL kernel fail on the nVidia driver, but not Intel (possible driver bug)?

I originally wrote an OpenCL program to calculate very large hermitian matrices, where the kernel calculates a single pair of entries in the matrix (the upper triangular portion, and its lower triangular complement). Very early on, I found a very…

opencl nvidia gpgpu

asked Sep 25 '17 at 15:49

stix

1,140
13
36

Prev 1 2 3

…

99 100 Next