Questions tagged [gpgpu]

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)"

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)". The two biggest manufacturers of GPUs are NVIDIA and AMD, although Intel has recently been moving in this direction with the Haswell APUs . There are two popular frameworks for GPGPU - NVidia's CUDA, which is only supported on its own hardware, and OpenCL developed by the Khronos Group. The latter is a consortium including all of AMD, NVidia, Intel, Apple and others, but the OpenCL standard is only half-heartedly supported by NVidia - creating a partial reflection of the rivalry among GPU manufacturers in the rivalry of programming frameworks.

The attractiveness of using GPUs for other tasks largely stems from the parallel processing capabilities of many modern graphics cards. Some cards can have thousands of streams processing similar data at incredible rates.

In the past, CPUs first emulated threading/multiple data streams through interpolation of processing tasks. Over time, we gained multiple cores with multiple threads. Now video cards house a number of GPUs, hosting many more threads or streams than many CPUs, and extremely fast memory integrated together. This huge increase of threads in execution is achieved thanks to the technique SIMD which stands for Single Instruction Multiple Data. This makes an environment uniquely suited for heavy computational loads that are able to undergo parallelization. Furthermore this technique also marks one of main differences between GPUs and CPUs as they are doing best what they were designed for.

More information at http://en.wikipedia.org/wiki/GPGPU

2243 questions
52
votes
3 answers

CUDA: How many concurrent threads in total?

I have a GeForce GTX 580, and I want to make a statement about the total number of threads that can (ideally) actually be run in parallel, to compare with 2 or 4 multi-core CPU's. deviceQuery gives me the following possibly relevant information:…
Eskil
  • 3,385
  • 5
  • 28
  • 32
52
votes
2 answers

Running more than one CUDA applications on one GPU

CUDA document does not specific how many CUDA process can share one GPU. For example, if I launch more than one CUDA programs by the same user with only one GPU card installed in the system, what is the effect? Will it guarantee the correctness of…
cache
  • 1,239
  • 3
  • 13
  • 21
50
votes
3 answers

CUDA model - what is warp size?

What's the relationship between maximum work group size and warp size? Let’s say my device has 240 CUDA streaming processors (SP) and returns the following information - CL_DEVICE_MAX_COMPUTE_UNITS: 30 CL_DEVICE_MAX_WORK_ITEM_SIZES: 512 / 512 /…
r00kie
  • 843
  • 1
  • 11
  • 12
46
votes
4 answers

CUDA Driver API vs. CUDA runtime

When writing CUDA applications, you can either work at the driver level or at the runtime level as illustrated on this image (The libraries are CUFFT and CUBLAS for advanced math): (source: tomshw.it) I assume the tradeoff between the two are…
Morten Christiansen
  • 19,002
  • 22
  • 69
  • 94
45
votes
1 answer

Choosing between GeForce or Quadro GPUs to do machine learning via TensorFlow

Is there any noticeable difference in TensorFlow performance if using Quadro GPUs vs GeForce GPUs? e.g. does it use double precision operations or something else that would cause a drop in GeForce cards? I am about to buy a GPU for TensorFlow, and…
user2771184
  • 709
  • 1
  • 5
  • 9
41
votes
4 answers

How does CUDA assign device IDs to GPUs?

When a computer has multiple CUDA-capable GPUs, each GPU is assigned a device ID. By default, CUDA kernels execute on device ID 0. You can use cudaSetDevice(int device) to select a different device. Let's say I have two GPUs in my machine: a GTX 480…
solvingPuzzles
  • 8,541
  • 16
  • 69
  • 112
38
votes
3 answers

GPGPU vs. Multicore?

What are the key practical differences between GPGPU and regular multicore/multithreaded CPU programming, from the programmer's perspective? Specifically: What types of problems are better suited to regular multicore and what types are better…
dsimcha
  • 67,514
  • 53
  • 213
  • 334
36
votes
2 answers

Should I unify two similar kernels with an 'if' statement, risking performance loss?

I have 2 very similar kernel functions, in the sense that the code is nearly the same, but with a slight difference. Currently I have 2 options: Write 2 different methods (but very similar ones) Write a single kernel and put the code blocks…
lina
  • 1,679
  • 4
  • 21
  • 25
36
votes
4 answers

What is the current status of C++ AMP

I am working on high performance code in C++ and have been using both CUDA and OpenCL and more recently C++AMP, which I like very much. I am however a little worried that it is not being developed and extended and will die out. What leads me to this…
JoeTaicoon
  • 1,383
  • 1
  • 12
  • 28
33
votes
2 answers

OpenCL vs OpenMP performance

Have there been any studies comparing OpenCL to OpenMP performance? Specifically I am interested in the overhead cost of launching threads with OpenCL, e.g., if one were to decompose the domain into a very large number of individual work items (each…
Robert
  • 673
  • 2
  • 7
  • 8
32
votes
8 answers

How to use OpenCL on Android?

For plattform independence (desktop, cloud, mobile, ...) it would be great to use OpenCL for GPGPU development when speed does matter. I know Google pushes RenderScript as an alternative, but it seems to be only be available for Android and is…
Rodja
  • 7,998
  • 8
  • 48
  • 55
32
votes
8 answers

CUDA apps time out & fail after several seconds - how to work around this?

I've noticed that CUDA applications tend to have a rough maximum run-time of 5-15 seconds before they will fail and exit out. I realize it's ideal to not have CUDA application run that long but assuming that it is the correct choice to use CUDA and…
rck
  • 2,020
  • 2
  • 23
  • 23
30
votes
4 answers

Python real time image classification problems with Neural Networks

I'm attempting use caffe and python to do real-time image classification. I'm using OpenCV to stream from my webcam in one process, and in a separate process, using caffe to perform image classification on the frames pulled from the webcam. Then I'm…
user3543300
  • 499
  • 2
  • 9
  • 27
28
votes
7 answers

How to obtain OpenCL SDK?

I was perusing http://www.khronos.org/ web site and only found headers for OpenCL (not OpenGL which I don't care about). How can I obtain OpenCL SDK?
Roman Kagan
  • 10,440
  • 26
  • 86
  • 126
26
votes
1 answer

Integer calculations on GPU

For my work it's particularly interesting to do integer calculations, which obviously are not what GPUs were made for. My question is: Do modern GPUs support efficient integer operations? I realize this should be easy to figure out for myself, but I…
gspr
  • 11,144
  • 3
  • 41
  • 74