Questions tagged [gpgpu]

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)"

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)". The two biggest manufacturers of GPUs are NVIDIA and AMD, although Intel has recently been moving in this direction with the Haswell APUs . There are two popular frameworks for GPGPU - NVidia's CUDA, which is only supported on its own hardware, and OpenCL developed by the Khronos Group. The latter is a consortium including all of AMD, NVidia, Intel, Apple and others, but the OpenCL standard is only half-heartedly supported by NVidia - creating a partial reflection of the rivalry among GPU manufacturers in the rivalry of programming frameworks.

The attractiveness of using GPUs for other tasks largely stems from the parallel processing capabilities of many modern graphics cards. Some cards can have thousands of streams processing similar data at incredible rates.

In the past, CPUs first emulated threading/multiple data streams through interpolation of processing tasks. Over time, we gained multiple cores with multiple threads. Now video cards house a number of GPUs, hosting many more threads or streams than many CPUs, and extremely fast memory integrated together. This huge increase of threads in execution is achieved thanks to the technique SIMD which stands for Single Instruction Multiple Data. This makes an environment uniquely suited for heavy computational loads that are able to undergo parallelization. Furthermore this technique also marks one of main differences between GPUs and CPUs as they are doing best what they were designed for.

More information at http://en.wikipedia.org/wiki/GPGPU

2243 questions
16
votes
1 answer

Processing camera feed data on GPU (metal) and CPU (OpenCV) on iPhone

I'm doing realtime video processing on iOS at 120 fps and want to first preprocess image on GPU (downsample, convert color, etc. that are not fast enough on CPU) and later postprocess frame on CPU using OpenCV. What's the fastest way to share camera…
pzo
  • 2,087
  • 3
  • 24
  • 42
16
votes
1 answer

Does nvidia-smi give instantaneous informations or an average on the interval?

When i use nvidia-smi -l 60 for example, i was asking to myself if : the information given is a snapshot at the time it's used each 60 seconds the information given is the average between the time and the time +/- 60 seconds Do you know the answer…
Vincent Rossignol
  • 215
  • 1
  • 2
  • 8
16
votes
7 answers

Financial applications on GPGPU

I want to know what sort of financial applications can be implemented using a GPGPU. I'm aware of Option pricing/ Stock price estimation using Monte Carlo simulation on GPGPU using CUDA. Can someone enumerate the various possibilities of utilizing…
CUDA-dev
  • 161
  • 1
  • 3
16
votes
2 answers

How to calculate the speedup of a GPU program?

Motivation: I have been tasked with measuring the Karp-Flatt metric and parallel efficiency of my CUDA C code, which requires computation of speedup. In particular, I need to plot all these metrics as a function of the number of processors…
mchen
  • 9,808
  • 17
  • 72
  • 125
16
votes
1 answer

In OpenCL, what is the difference between platform, context, and device?

I am new to OpenCL programming. What is the difference between device, context, and platform?
sandeep.ganage
  • 1,409
  • 2
  • 21
  • 47
15
votes
4 answers

How to call a host function in a CUDA kernel?

As the following error implies, calling a host function ('rand') is not allowed in kernel, and I wonder whether there is a solution for it if I do need to do that. error: calling a host function("rand") from a __device__/__global__…
Hailiang Zhang
  • 17,604
  • 23
  • 71
  • 117
15
votes
3 answers

Decode video frames on iPhone GPU

I'm looking for the fastest way to decode a local mpeg-4 video's frames on the iPhone. I'm simply interested in the luminance values of the pixels in every 10th frame. I don't need to render the video anywhere. I've tried ffmpeg, AVAssetReader,…
simon.d
  • 2,471
  • 3
  • 33
  • 51
15
votes
2 answers

How To Structure Large OpenCL Kernels?

I have worked with OpenCL on a couple of projects, but have always written the kernel as one (sometimes rather large) function. Now I am working on a more complex project and would like to share functions across several kernels. But the examples I…
andrew cooke
  • 45,717
  • 10
  • 93
  • 143
15
votes
4 answers

Error compiling Cuda - expected primary-expression

this program seems be fine but I still getting an erro, some suggestion? Program: #include "dot.h" #include #include #include int main(int argc, char** argv) { int *a, *b, *c; int *dev_a, *dev_b, *dev_c; …
Custodio
  • 8,594
  • 15
  • 80
  • 115
15
votes
1 answer

How to run GPGPU inside docker image with different from host kernel and GPU driver version

I have machine with several GPUs. My idea is to attach them to different docker instances in order to use that instances in CUDA (or OpenCL) calculations. My goal is to setup docker image with quite old Ubuntu and quite old AMD video drivers…
petRUShka
  • 9,812
  • 12
  • 61
  • 95
15
votes
4 answers

Using Delphi to take advantage of GPGPU technology?

GPGPU is the principle of using the parallel processors on video cards for massive increases in performance. Does anyone have any ideas about using GPGPU in Delphi, using either OpenCL or CUDA? CUDA was/is NVidia only, but they have also adopted…
TallGuy
  • 151
  • 1
  • 3
15
votes
2 answers

Private cloud GPU virtualization similar to Amazon Web Services Cluster GPU instances

I am searching for options that enable dynamic cloud-based NVIDIA GPU virtualization similar to the way AWS assigns GPUs for Cluster GPU Instances. My project is working on standing up an internal cloud. One requirement is the ability to allocate…
Bob B
  • 4,484
  • 3
  • 24
  • 32
15
votes
3 answers

Difference between kernels construct and parallel construct

I study a lot of articles and the manual of OpenACC but still i don't understand the main difference of these two constructs.
pg1927
  • 153
  • 1
  • 6
15
votes
3 answers

modular arithmetic on the gpu

I am working on the GPU algorithm which is supposed to do a lot of modular computations. Particularly, various operations on matrices in a finite field which in the long run reduce to primitive operations like: (a*b - c*d) mod m or (a*b + c) mod m…
user1545642
15
votes
1 answer

Doing readback from Direct3D textures and surfaces

I need to figure out how to get the data from D3D textures and surfaces back to system memory. What's the fastest way to do such things and how? Also if I only need one subrect, how can one read back only that portion without having to read back…
Baxissimo
  • 2,629
  • 2
  • 25
  • 23