Questions tagged [gpgpu]

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)"

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)". The two biggest manufacturers of GPUs are NVIDIA and AMD, although Intel has recently been moving in this direction with the Haswell APUs . There are two popular frameworks for GPGPU - NVidia's CUDA, which is only supported on its own hardware, and OpenCL developed by the Khronos Group. The latter is a consortium including all of AMD, NVidia, Intel, Apple and others, but the OpenCL standard is only half-heartedly supported by NVidia - creating a partial reflection of the rivalry among GPU manufacturers in the rivalry of programming frameworks.

The attractiveness of using GPUs for other tasks largely stems from the parallel processing capabilities of many modern graphics cards. Some cards can have thousands of streams processing similar data at incredible rates.

In the past, CPUs first emulated threading/multiple data streams through interpolation of processing tasks. Over time, we gained multiple cores with multiple threads. Now video cards house a number of GPUs, hosting many more threads or streams than many CPUs, and extremely fast memory integrated together. This huge increase of threads in execution is achieved thanks to the technique SIMD which stands for Single Instruction Multiple Data. This makes an environment uniquely suited for heavy computational loads that are able to undergo parallelization. Furthermore this technique also marks one of main differences between GPUs and CPUs as they are doing best what they were designed for.

More information at http://en.wikipedia.org/wiki/GPGPU

2243 questions

votes

1 answer

CUDA atomic operation performance in different scenarios

When I came across this question on SO, I was curious to know the answer. so I wrote below piece of code to test atomic operation performance in different scenarios. The OS is Ubuntu 12.04 with CUDA 5.5 and the device is GeForce GTX780 (Kepler…

asked Mar 13 '14 at 01:15

Farzad

3,288
2
29
53

votes

2 answers

How to get the assembly code of a CUDA kernel?

I have some kernels that I have written in both OpenCL and CUDA. When running OpenCL programs in the AMD profiler, it allows me to view the assembly code of the kernel. I would like to compare this with the assembly code of the CUDA kernels to…

c assembly cuda gpgpu nvidia

asked Dec 09 '13 at 23:02

PseudoPsyche

4,332
5
37
58

votes

1 answer

Raytracing in OpenGL via compute shader

I am trying to do some raytracing in OpenGL via the compute shader and I came across a weird problem. At the moment I just want to display a sphere without any shading. My compute shader launches a ray for every pixel and looks like this: #version…

opengl gpgpu raytracing compute-shader

asked Mar 11 '13 at 16:59

Stan

votes

3 answers

Linear Algebra library using OpenGL ES 2.0 for iOS

Does anyone know of a linear algebra library for iOS that uses OpenGL ES 2.0 under the covers? Specifically, I am looking for a way to do matrix multiplication on arbitrary-sized matrices (e.g., much larger than 4x4, more like 5,000 x 100,000)…

ios opengl-es opengl-es-2.0 gpgpu metal-performance-shaders

asked Jan 11 '13 at 23:07

cklin

votes

1 answer

GPU 2D shared memory dynamic allocation

I am aware of the dynamic allocation for 1D arrays, but how can it be done for 2D arrays? myKernel<<>>(); .... __global__ void myKernel(){ __shared__ float sData[][]; ..... } Say I want to…

cuda gpu nvidia gpgpu gpu-shared-memory

asked Nov 02 '12 at 13:03

Manolete

3,431
7
54
92

votes

2 answers

CUDA - why is warp based parallel reduction slower?

I had the idea about a warp based parallel reduction since all threads of a warp are in sync by definition. So the idea was that the input data can be reduced by factor 64 (each thread reduces two elements) without any synchronization need. Same…

cuda gpgpu reduction

asked Oct 04 '12 at 18:02

djmj

5,579
5
54
92

votes

2 answers

What do I need for programming for Tegra GPU

Can I develop applications on CUDA processor Tegra 1/2, what do I need for this and what Tegra 1/2 CUDA-capability? I found only NVIDIA Debug Manager for development in Eclipse for Android, but I do not know if he can develop a CUDA-style.

android cuda gpgpu tegra

asked Sep 12 '12 at 18:28

Alex

12,578
15
99
195

votes

1 answer

Linking with 3rd party CUDA libraries slows down cudaMalloc

It is not a secret that on CUDA 4.x the first call to cudaMalloc can be ridiculously slow (which was reported several times), seemingly a bug in CUDA drivers. Recently, I noticed weird behaviour: the running time of cudaMalloc directly depends on…

cuda gpgpu gpu

asked Jul 26 '12 at 07:44

user1545642

votes

4 answers

About warp voting function

The CUDA programming guide introduced the concept of warp vote function, "_all", "_any" and "__ballot". My question is: what applications will use these 3 functions?

cuda gpu gpgpu

asked May 11 '12 at 19:10

Fan Zhang

votes

2 answers

Parameters to CUDA kernels

When invoking a CUDA kernel for a specific thread configuration, are there any strict rules on which memory space (device/host) kernel parameters should reside in and what type they should be? Suppose I launch a 1-D grid of threads with…

cuda gpgpu

asked Nov 28 '11 at 21:22

smilingbuddha

14,334
33
112
189

votes

5 answers

Sparse Cholesky factorization algorithm for GPU

Can anyone provide me with a parallel algorithm for calculating the sparse Cholesky factorization? It must be suitable for execution on a GPU. Any answers in CUDA, OpenCL, or even pseudo-code would be much appreciated.

algorithm math cuda opencl gpgpu

asked Aug 19 '11 at 14:31

Jonathan DeCarlo

2,798
1
20
24

votes

4 answers

GPGPU programming with OpenGL ES 2.0

I am trying to do some image processing on the GPU, e.g. median, blur, brightness, etc. The general idea is to do something like this framework from GPU Gems 1. I am able to write the GLSL fragment shader for processing the pixels as I've been…

image-processing opengl-es glsl gpgpu

asked Mar 01 '11 at 08:36

Albus Dumbledore

12,368
23
64
105

votes

1 answer

Can I prefetch specific data to a specific cache level in a CUDA kernel?

I understand that Fermi GPUs support prefetching to L1 or L2 cache. However, in the CUDA reference manual I can not find any thing about it. Dues CUDA allow my kernel code to prefetch specific data to a specific level of cache?

caching cuda gpgpu prefetch ptx

asked Jan 21 '11 at 04:08

dalibocai

2,289
5
29
45

votes

1 answer

How many cores in my GPU?

How can you tell how many cores are available in any given GPU? I would prefer a Windows/UI based answer, but API (DirectX?) is also nice to know.

directx gpu gpgpu

asked Feb 25 '14 at 17:03

MonoThreaded

11,429
12
71
102

votes

4 answers

GPU-based inclusive scan on an unbalanced tree

I have the following problem: I need to compute the inclusive scans (e.g. prefix sums) of values based on a tree structure on the GPU. These scans are either from the root node (top-down) or from the leaf nodes (bottom-up). The case of a simple…

algorithm cuda tree gpgpu

asked Oct 03 '13 at 13:13

BenC

8,729
3
49
68

Prev 1 2 3

…

99 100 Next