Questions tagged [gpgpu]

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)"

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)". The two biggest manufacturers of GPUs are NVIDIA and AMD, although Intel has recently been moving in this direction with the Haswell APUs . There are two popular frameworks for GPGPU - NVidia's CUDA, which is only supported on its own hardware, and OpenCL developed by the Khronos Group. The latter is a consortium including all of AMD, NVidia, Intel, Apple and others, but the OpenCL standard is only half-heartedly supported by NVidia - creating a partial reflection of the rivalry among GPU manufacturers in the rivalry of programming frameworks.

The attractiveness of using GPUs for other tasks largely stems from the parallel processing capabilities of many modern graphics cards. Some cards can have thousands of streams processing similar data at incredible rates.

In the past, CPUs first emulated threading/multiple data streams through interpolation of processing tasks. Over time, we gained multiple cores with multiple threads. Now video cards house a number of GPUs, hosting many more threads or streams than many CPUs, and extremely fast memory integrated together. This huge increase of threads in execution is achieved thanks to the technique SIMD which stands for Single Instruction Multiple Data. This makes an environment uniquely suited for heavy computational loads that are able to undergo parallelization. Furthermore this technique also marks one of main differences between GPUs and CPUs as they are doing best what they were designed for.

More information at http://en.wikipedia.org/wiki/GPGPU

2243 questions
10
votes
1 answer

CUDA atomic operation performance in different scenarios

When I came across this question on SO, I was curious to know the answer. so I wrote below piece of code to test atomic operation performance in different scenarios. The OS is Ubuntu 12.04 with CUDA 5.5 and the device is GeForce GTX780 (Kepler…
Farzad
  • 3,288
  • 2
  • 29
  • 53
10
votes
2 answers

How to get the assembly code of a CUDA kernel?

I have some kernels that I have written in both OpenCL and CUDA. When running OpenCL programs in the AMD profiler, it allows me to view the assembly code of the kernel. I would like to compare this with the assembly code of the CUDA kernels to…
PseudoPsyche
  • 4,332
  • 5
  • 37
  • 58
10
votes
1 answer

Raytracing in OpenGL via compute shader

I am trying to do some raytracing in OpenGL via the compute shader and I came across a weird problem. At the moment I just want to display a sphere without any shading. My compute shader launches a ray for every pixel and looks like this: #version…
Stan
  • 721
  • 10
  • 24
10
votes
3 answers

Linear Algebra library using OpenGL ES 2.0 for iOS

Does anyone know of a linear algebra library for iOS that uses OpenGL ES 2.0 under the covers? Specifically, I am looking for a way to do matrix multiplication on arbitrary-sized matrices (e.g., much larger than 4x4, more like 5,000 x 100,000)…
cklin
  • 900
  • 4
  • 16
10
votes
1 answer

GPU 2D shared memory dynamic allocation

I am aware of the dynamic allocation for 1D arrays, but how can it be done for 2D arrays? myKernel<<>>(); .... __global__ void myKernel(){ __shared__ float sData[][]; ..... } Say I want to…
Manolete
  • 3,431
  • 7
  • 54
  • 92
10
votes
2 answers

CUDA - why is warp based parallel reduction slower?

I had the idea about a warp based parallel reduction since all threads of a warp are in sync by definition. So the idea was that the input data can be reduced by factor 64 (each thread reduces two elements) without any synchronization need. Same…
djmj
  • 5,579
  • 5
  • 54
  • 92
10
votes
2 answers

What do I need for programming for Tegra GPU

Can I develop applications on CUDA processor Tegra 1/2, what do I need for this and what Tegra 1/2 CUDA-capability? I found only NVIDIA Debug Manager for development in Eclipse for Android, but I do not know if he can develop a CUDA-style.
Alex
  • 12,578
  • 15
  • 99
  • 195
10
votes
1 answer

Linking with 3rd party CUDA libraries slows down cudaMalloc

It is not a secret that on CUDA 4.x the first call to cudaMalloc can be ridiculously slow (which was reported several times), seemingly a bug in CUDA drivers. Recently, I noticed weird behaviour: the running time of cudaMalloc directly depends on…
user1545642
10
votes
4 answers

About warp voting function

The CUDA programming guide introduced the concept of warp vote function, "_all", "_any" and "__ballot". My question is: what applications will use these 3 functions?
Fan Zhang
  • 609
  • 1
  • 9
  • 17
9
votes
2 answers

Parameters to CUDA kernels

When invoking a CUDA kernel for a specific thread configuration, are there any strict rules on which memory space (device/host) kernel parameters should reside in and what type they should be? Suppose I launch a 1-D grid of threads with…
smilingbuddha
  • 14,334
  • 33
  • 112
  • 189
9
votes
5 answers

Sparse Cholesky factorization algorithm for GPU

Can anyone provide me with a parallel algorithm for calculating the sparse Cholesky factorization? It must be suitable for execution on a GPU. Any answers in CUDA, OpenCL, or even pseudo-code would be much appreciated.
Jonathan DeCarlo
  • 2,798
  • 1
  • 20
  • 24
9
votes
4 answers

GPGPU programming with OpenGL ES 2.0

I am trying to do some image processing on the GPU, e.g. median, blur, brightness, etc. The general idea is to do something like this framework from GPU Gems 1. I am able to write the GLSL fragment shader for processing the pixels as I've been…
Albus Dumbledore
  • 12,368
  • 23
  • 64
  • 105
9
votes
1 answer

Can I prefetch specific data to a specific cache level in a CUDA kernel?

I understand that Fermi GPUs support prefetching to L1 or L2 cache. However, in the CUDA reference manual I can not find any thing about it. Dues CUDA allow my kernel code to prefetch specific data to a specific level of cache?
dalibocai
  • 2,289
  • 5
  • 29
  • 45
9
votes
1 answer

How many cores in my GPU?

How can you tell how many cores are available in any given GPU? I would prefer a Windows/UI based answer, but API (DirectX?) is also nice to know.
MonoThreaded
  • 11,429
  • 12
  • 71
  • 102
9
votes
4 answers

GPU-based inclusive scan on an unbalanced tree

I have the following problem: I need to compute the inclusive scans (e.g. prefix sums) of values based on a tree structure on the GPU. These scans are either from the root node (top-down) or from the leaf nodes (bottom-up). The case of a simple…
BenC
  • 8,729
  • 3
  • 49
  • 68