Questions tagged [gpgpu]

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)"

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)". The two biggest manufacturers of GPUs are NVIDIA and AMD, although Intel has recently been moving in this direction with the Haswell APUs . There are two popular frameworks for GPGPU - NVidia's CUDA, which is only supported on its own hardware, and OpenCL developed by the Khronos Group. The latter is a consortium including all of AMD, NVidia, Intel, Apple and others, but the OpenCL standard is only half-heartedly supported by NVidia - creating a partial reflection of the rivalry among GPU manufacturers in the rivalry of programming frameworks.

The attractiveness of using GPUs for other tasks largely stems from the parallel processing capabilities of many modern graphics cards. Some cards can have thousands of streams processing similar data at incredible rates.

In the past, CPUs first emulated threading/multiple data streams through interpolation of processing tasks. Over time, we gained multiple cores with multiple threads. Now video cards house a number of GPUs, hosting many more threads or streams than many CPUs, and extremely fast memory integrated together. This huge increase of threads in execution is achieved thanks to the technique SIMD which stands for Single Instruction Multiple Data. This makes an environment uniquely suited for heavy computational loads that are able to undergo parallelization. Furthermore this technique also marks one of main differences between GPUs and CPUs as they are doing best what they were designed for.

More information at http://en.wikipedia.org/wiki/GPGPU

2243 questions
6
votes
2 answers

CUDA: Does passing arguments to a kernel slow the kernel launch much?

CUDA beginner here. In my code i am currently launching kernels a lot of times in a loop in the host code. (Because i need synchronization between blocks). So i wondered if i might be able to optimize the kernel launch. My kernel launches look…
Eskil
  • 3,385
  • 5
  • 28
  • 32
6
votes
2 answers

How can Opengl Es be use for gpgpu implementation

I want to use Opengl Es for gpgpu implementation of an image processing code. I want to know can I use Opengl Es for this purpose. If I can than which version of Opengl Es will be more appropriate for this purpose (Opengl Es 1.1 or 2.0).
Dr. Arslan
  • 1,254
  • 1
  • 16
  • 27
6
votes
1 answer

CUDA compiler is unable to compile a simple test program

I am trying to get NVIDIA's CUDA setup and installed on my PC which has an NVIDIA GEFORCE RTX 2080 SUPER graphics card. After hours of trying different things and lots of research I have gotten CUDA to work using the Command Prompt, though trying to…
Pencilcaseman
  • 360
  • 6
  • 16
6
votes
4 answers

OpenCL image histogram

I'm trying to write a histogram kernel in OpenCL to compute 256 bin R, G, and B histograms of an RGBA32F input image. My kernel looks like this: const sampler_t mSampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP| …
wallacer
  • 12,855
  • 3
  • 27
  • 45
6
votes
3 answers

figuring out how many blocks and threads for a cuda kernel, and how to use them

I have been trying to figure out how to make what I thought would be a simple kernel to take the average of the values in a 2d matrix, but I am having some issues getting my thought process straight on it. According to my deviceQuery output, my GPU…
Derek
  • 11,715
  • 32
  • 127
  • 228
6
votes
2 answers

Resources for doing GPU-accelerated computation (GPGPU) on the iPhone?

I'm interested in doing GPU-accelerated computation in iOS (for iPhones 3GS and 4). Unfortunately, neither device supports OpenCL, so it seems the only choice is to express the program data as graphics data and use the OpenGL ES 2.0 programmable…
emchristiansen
  • 3,550
  • 3
  • 26
  • 40
6
votes
3 answers

How do I use the GPU available with OpenMP?

I am trying to get some code to run on the GPU using OpenMP, but I am not succeeding. In my code, I am performing a matrix multiplication using for loops: once using OpenMP pragma tags and once without. (This is so that I can compare the execution…
Josiah
  • 63
  • 1
  • 5
6
votes
1 answer

Concurrency, 4 CUDA Applications competing to get GPU resources

What would happen if there are four concurrent CUDA Applications competing for resources in one single GPU so they can offload the work to the graphic card?. The Cuda Programming Guide 3.1 mentions that there are certain methods which are…
Bartzilla
  • 2,768
  • 4
  • 28
  • 37
6
votes
1 answer

Number of clock cycles per operation in GPU

Is there any way to find the number of clock cycles needed to perform different operations like division, subtraction and addition in GPU using CUDA?
starrr
  • 1,013
  • 1
  • 17
  • 48
6
votes
3 answers

In a GLSL fragment shader, how to access to texel at a specific mipmap level?

I am using OpenGL to do some GPGPU computations through the combination of one vertex shader and one fragment shader. I need to do computations on a image at different scale. I would like to use mipmaps since their generation can be automatic and…
Jim
  • 530
  • 6
  • 15
6
votes
2 answers

How to calculate GPU memory usage in Theano?

I am experimenting with different Theano models and use a curriculum of ever increasing sequence length. How can I predict ahead of time how big to make the batch size for any given sequence length and model in order to fill the GPU's memory? To…
Zach Dwiel
  • 529
  • 1
  • 4
  • 18
6
votes
1 answer

How can WebGL be used for general computing (GPGPU)?

I've heard that you can use WebGL for general computing (GPGPU) by generating textures and using the RGB values (or something like that to run computations). How is this possible, and could someone please provide a simple example with…
HartleySan
  • 7,404
  • 14
  • 66
  • 119
6
votes
1 answer

Managing properly an array of results that is larger than the memory available at the GPU?

Having defined how to deal with errors: static void HandleError( cudaError_t err, const char *file, int line ) { if (err != cudaSuccess) { printf( "%s in %s at line %d\n",…
user3116936
  • 492
  • 3
  • 21
6
votes
1 answer

Retrieving values from arrayfire array as standard types and serialization

I recently saw arrayfire demonstrated at GTC and I thought I would try it. Here are some questions I have run into while trying to use it. I am running Visual Studio 2013 on a Windows 7 system with OpenCL from the AMD App SDK 2.9-1. The biggest…
dcofer
  • 303
  • 2
  • 10
6
votes
1 answer

How to launch custom OpenCL kernel in OpenCV (3.0.0) OCL?

I'm probably misusing OpenCV by using it as wrapper to the official OpenCL C++ bindings so that I can launch my own kernels. However, OpenCV does have classes like Program, ProgramSource, Kernel, Queue, etc. that seem to tell me that I can launch…
Mickael Caruso
  • 8,721
  • 11
  • 40
  • 72