Questions tagged [gpgpu]

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)"

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)". The two biggest manufacturers of GPUs are NVIDIA and AMD, although Intel has recently been moving in this direction with the Haswell APUs . There are two popular frameworks for GPGPU - NVidia's CUDA, which is only supported on its own hardware, and OpenCL developed by the Khronos Group. The latter is a consortium including all of AMD, NVidia, Intel, Apple and others, but the OpenCL standard is only half-heartedly supported by NVidia - creating a partial reflection of the rivalry among GPU manufacturers in the rivalry of programming frameworks.

The attractiveness of using GPUs for other tasks largely stems from the parallel processing capabilities of many modern graphics cards. Some cards can have thousands of streams processing similar data at incredible rates.

In the past, CPUs first emulated threading/multiple data streams through interpolation of processing tasks. Over time, we gained multiple cores with multiple threads. Now video cards house a number of GPUs, hosting many more threads or streams than many CPUs, and extremely fast memory integrated together. This huge increase of threads in execution is achieved thanks to the technique SIMD which stands for Single Instruction Multiple Data. This makes an environment uniquely suited for heavy computational loads that are able to undergo parallelization. Furthermore this technique also marks one of main differences between GPUs and CPUs as they are doing best what they were designed for.

More information at http://en.wikipedia.org/wiki/GPGPU

2243 questions
8
votes
5 answers

Why do we need GPU for Deep Learning?

As the question already suggests, I am new to deep learning. I know that the learning process of the model will be slow without GPU. If I am willing to wait, Will it be OK if i use CPU only ?
Kanu
  • 91
  • 6
8
votes
7 answers

OpenCL FFT lib for GPUs?

Is there any general FFT lib available for running on the GPU using OpenCL? As far as my knowledge goes, Apple sample code for power-of-two OpenCL FFT is the only such code available? Does any such library exist for non-power-of-two transform…
Neo
  • 157
  • 1
  • 4
8
votes
3 answers

Speedup GPU vs CPU for matrix operations

I am wondering how much GPU computing would help me speed up my simulations. The critical part of my code is matrix multiplication. Basically the code looks like the following python code with matrices of order 1000 and long for loops. import numpy…
physicsGuy
  • 3,437
  • 3
  • 27
  • 35
8
votes
2 answers

Does cuDNN library works with All nvidia graphic cards?

I study the use of cuDNN library in my project. But my nvidia graphic card is little bit old. I search on the net if cuDNN works with all graphic cards. I didn,t find even in their principal page. Which nvidia graphic cards are compatible with…
ProEns08
  • 1,856
  • 2
  • 22
  • 38
8
votes
3 answers

Does GLSL utilize SLI? Does OpenCL? What is better, GLSL or OpenCL for multiple GPUs?

To what extend does OpenGL's GLSL utilize SLI setups? Is it utilized at all at the point of execution or only for end rendering? Similarly, I know that OpenCL is alien to SLI but assuming one has several GPUs, how does it compare to GLSL in…
j riv
  • 3,593
  • 6
  • 39
  • 54
8
votes
1 answer

Synax for functions other than vertex|fragment|kernel in metal shader file

I'm porting some basic OpenCL code to a Metal compute shader. Get stuck pretty early when attempting to convert the miscellaneous helper functions. For example, including something like the following function in a .metal file Xcode (7.1) gives me a…
Jaysen Marais
  • 3,956
  • 28
  • 44
8
votes
1 answer

How to make the most of SIMD in OpenCL?

In the optimization guide of Beignet, an open source implementation of OpenCL targeting Intel GPUs Work group Size should be larger than 16 and be multiple of 16. As two possible SIMD lanes on Gen are 8 or 16. To not waste SIMD lanes, we need to…
user3528438
  • 2,737
  • 2
  • 23
  • 42
8
votes
2 answers

How to list CUDA devices in windows 7 using cmd?

How to display as a list CUDA availible devices in windows 7 using command line? Do I need to install additional software to do this?
mrgloom
  • 20,061
  • 36
  • 171
  • 301
8
votes
1 answer

Differences between clBLAS and ViennaCL?

Looking at the OpenCL libraries out there I am trying to get a complete grasp of each one. One library in particular is clBLAS. Their website states that it implements BLAS level 1,2, & 3 methods. That is great but ViennaCL also has BLAS…
cdeterman
  • 19,630
  • 7
  • 76
  • 100
8
votes
6 answers

What's the most trivial function that would benfit from being computed on a GPU?

I'm just starting out learning OpenCL. I'm trying to get a feel for what performance gains to expect when moving functions/algorithms to the GPU. The most basic kernel given in most tutorials is a kernel that takes two arrays of numbers and sums…
hanDerPeder
  • 397
  • 2
  • 12
8
votes
1 answer

Does NVidia support OpenCL SPIR?

I am wondering that whether nvidia supports spir backend or not? if yes, i couldn't find any document and sample example about that. but if not, is there a any way to work spir backend onto nvidia gpus? thanks in advance
grypp
  • 405
  • 2
  • 15
8
votes
1 answer

Does the nVidia RDMA GPUDirect always operate only physical addresses (in physical address space of the CPU)?

As we know: http://en.wikipedia.org/wiki/IOMMU#Advantages Peripheral memory paging can be supported by an IOMMU. A peripheral using the PCI-SIG PCIe Address Translation Services (ATS) Page Request Interface (PRI) extension can detect and signal…
Alex
  • 12,578
  • 15
  • 99
  • 195
8
votes
1 answer

CUDA: What is the threads per multiprocessor and threads per block distinction?

We have a workstation with two Nvidia Quadro FX 5800 cards installed. Running the deviceQuery CUDA sample reveals that the maximum threads per multiprocessor (SM) is 1024, while the maximum threads per block is 512. Given that only one block can be…
James Paul Turner
  • 791
  • 3
  • 8
  • 23
8
votes
1 answer

Performance issues: Single CPU core vs Single CUDA core

I wanted to compare the speed of a single Intel CPU core with the speed of an single nVidia GPU core (ie: a single CUDA code, a single thread). I did implement the following naive 2d image convolution algorithm: void convolution_cpu(uint8_t* res,…
AstrOne
  • 3,569
  • 7
  • 32
  • 54
8
votes
3 answers

Efficient bucket-sort on GPU

For a current OpenCL GPGPU project, I need to sort elements in an array according to some key with 64 possible values. I need the final array to have all elemens with the same key to be contiguous. It's sufficient to have an associative array…
leemes
  • 44,967
  • 21
  • 135
  • 183