Questions tagged [gpgpu]

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)"

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)". The two biggest manufacturers of GPUs are NVIDIA and AMD, although Intel has recently been moving in this direction with the Haswell APUs . There are two popular frameworks for GPGPU - NVidia's CUDA, which is only supported on its own hardware, and OpenCL developed by the Khronos Group. The latter is a consortium including all of AMD, NVidia, Intel, Apple and others, but the OpenCL standard is only half-heartedly supported by NVidia - creating a partial reflection of the rivalry among GPU manufacturers in the rivalry of programming frameworks.

The attractiveness of using GPUs for other tasks largely stems from the parallel processing capabilities of many modern graphics cards. Some cards can have thousands of streams processing similar data at incredible rates.

In the past, CPUs first emulated threading/multiple data streams through interpolation of processing tasks. Over time, we gained multiple cores with multiple threads. Now video cards house a number of GPUs, hosting many more threads or streams than many CPUs, and extremely fast memory integrated together. This huge increase of threads in execution is achieved thanks to the technique SIMD which stands for Single Instruction Multiple Data. This makes an environment uniquely suited for heavy computational loads that are able to undergo parallelization. Furthermore this technique also marks one of main differences between GPUs and CPUs as they are doing best what they were designed for.

More information at http://en.wikipedia.org/wiki/GPGPU

2243 questions
12
votes
2 answers

Matrix-vector multiplication in CUDA: benchmarking & performance

I'm updating my question with some new benchmarking results (I also reformulated the question to be more specific and I updated the code)... I implemented a kernel for matrix-vector multiplication in CUDA C following the CUDA C Programming Guide…
Pantelis Sopasakis
  • 1,902
  • 5
  • 26
  • 45
12
votes
4 answers

Getting started with PyOpenCL

I have recently discovered the power of GP-GPU (general purpose graphics processing unit) and want to take advantage of it to perform 'heavy' scientific and math calculations (that otherwise require big CPU clusters) on a single machine. I know that…
mariotoss
  • 414
  • 3
  • 7
  • 17
12
votes
2 answers

Is there a CUDA smart pointer?

If not, what is the standard way to free up cudaMalloced memory when an exception is thrown? (Note that I am unable to use Thrust.)
mchen
  • 9,808
  • 17
  • 72
  • 125
12
votes
2 answers

Running OpenCL on hardware from mixed vendors

I've been playing with the ATI OpenCL implementation in their Stream 2.0 beta. The OpenCL in the current beta only uses the CPU for now, the next version is supposed to support GPU kernels. I downloaded Stream because I have an ATI GPU in my work…
Roel
  • 19,338
  • 6
  • 61
  • 90
12
votes
1 answer

Aligning GPU memory accesses of an image convolution (OpenCL/CUDA) kernel

To understand how to make sure alignment requirement is met I read the following passage from the book Heterogeneous Computing with OpenCL p.no: 157, several times. This shows how to put padding for a problem in Image convolution (assuming 16 x 16…
gpuguy
  • 4,607
  • 17
  • 67
  • 125
11
votes
2 answers

Good books and resources on data parallel programming and algorithms

I've read the following and most of the NVIDIA manuals and other content. I was also at GTC last year for the papers and talks. CUDA by Example: An Introduction to General-Purpose GPU Programming Programming Massively Parallel Processors: A Hands-on…
Ade Miller
  • 13,575
  • 1
  • 42
  • 75
11
votes
3 answers

Is there a limit to OpenCL local memory?

Today I added four more __local variables to my kernel to dump intermediate results in. But just adding the four more variables to the kernel's signature and adding the corresponding Kernel arguments renders all output of the kernel to "0"s. None of…
Framester
  • 33,341
  • 51
  • 130
  • 192
11
votes
2 answers

Is there any way to find out and/or limit GPU usage by process in Windows?

I'd like to launch CPU and GPU intensive process on some machines, but these processes must not interfere with user's tasks. So I need to limit or at least detect GPU usage by my processes. These processes are closed-source, so I can't watch GPU…
LOST
  • 2,956
  • 3
  • 25
  • 40
11
votes
2 answers

Untrusted GPGPU code (OpenCL etc) - is it safe? What risks?

There are many approaches when it goes about running untrusted code on typical CPU : sandboxes, fake-roots, virtualization... What about untrusted code for GPGPU (OpenCL,cuda or already compiled one) ? Assuming that memory on graphics card is…
Grzegorz Wierzowiecki
  • 10,545
  • 9
  • 50
  • 88
11
votes
3 answers

How to optimize Conway's game of life for CUDA?

I've written this CUDA kernel for Conway's game of life: __global__ void gameOfLife(float* returnBuffer, int width, int height) { unsigned int x = blockIdx.x*blockDim.x + threadIdx.x; unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;…
11
votes
1 answer

Why an “if-else” statement (in GPUs code) will cut the performance in half

I read this article: FPGA or GPU? - The evolution continues And someone added a comment in which he wrote: Since GPUs are SIMD any code with an “if-else” statement will cut your performance in half. Half of the cores will execute the if part of…
user3668129
  • 4,318
  • 6
  • 45
  • 87
11
votes
3 answers

Why is NVIDIA Pascal GPUs slow on running CUDA Kernels when using cudaMallocManaged

I was testing the new CUDA 8 along with the Pascal Titan X GPU and is expecting speed up for my code but for some reason it ends up being slower. I am on Ubuntu 16.04. Here is the minimum code that can reproduce the result: CUDASample.cuh class…
user3667089
  • 2,996
  • 5
  • 30
  • 56
11
votes
2 answers

Using a GPU both as video card and GPGPU

Where I work, we do a lot of numerical computations and we are considering buying workstations with NVIDIA video cards because of CUDA (to work with TensorFlow and Theano). My question is: should these computers come with another video card to…
Ricardo Magalhães Cruz
  • 3,504
  • 6
  • 33
  • 57
11
votes
1 answer

Unable to generate gpg keys in linux

I'm not able to generate GPG keys in linux sudo gpg --gen-key # This is the command to try to generate key error You need a Passphrase to protect your secret key. gpg: problem with the agent: Timeout gpg: Key generation…
user2932003
  • 171
  • 2
  • 4
  • 14
11
votes
5 answers

Reducing Number of Registers Used in CUDA Kernel

I have a kernel which uses 17 registers, reducing it to 16 would bring me 100% occupancy. My question is: are there methods that can be used to reduce the number or registers used, excluding completely rewriting my algorithms in a different manner.…
zenna
  • 9,006
  • 12
  • 73
  • 101