Questions tagged [gpgpu]

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)"

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)". The two biggest manufacturers of GPUs are NVIDIA and AMD, although Intel has recently been moving in this direction with the Haswell APUs . There are two popular frameworks for GPGPU - NVidia's CUDA, which is only supported on its own hardware, and OpenCL developed by the Khronos Group. The latter is a consortium including all of AMD, NVidia, Intel, Apple and others, but the OpenCL standard is only half-heartedly supported by NVidia - creating a partial reflection of the rivalry among GPU manufacturers in the rivalry of programming frameworks.

The attractiveness of using GPUs for other tasks largely stems from the parallel processing capabilities of many modern graphics cards. Some cards can have thousands of streams processing similar data at incredible rates.

In the past, CPUs first emulated threading/multiple data streams through interpolation of processing tasks. Over time, we gained multiple cores with multiple threads. Now video cards house a number of GPUs, hosting many more threads or streams than many CPUs, and extremely fast memory integrated together. This huge increase of threads in execution is achieved thanks to the technique SIMD which stands for Single Instruction Multiple Data. This makes an environment uniquely suited for heavy computational loads that are able to undergo parallelization. Furthermore this technique also marks one of main differences between GPUs and CPUs as they are doing best what they were designed for.

More information at http://en.wikipedia.org/wiki/GPGPU

2243 questions
1
vote
2 answers

OpenCL version of cudaMemcpyToSymbol & optimization

Can someone tell me OpenCl version of cudaMemcpyToSymbol for copying __constant to device and getting back to host? Or usual clenquewritebuffer(...) will do the job ? Could not find much help in forum. Actually a few lines of demo will suffice. …
gpuguy
  • 4,607
  • 17
  • 67
  • 125
1
vote
1 answer

clGetDeviceIDs fails in OpenCL with error code -30

The output of the following program on my machine with ATI Firepro V8750 is as follows: "Couldn't find any devices:No error" (this happens at the call of first clGetDeviceIDs). the error code returned is -30. What does that mean? I am not able to…
gpuguy
  • 4,607
  • 17
  • 67
  • 125
1
vote
1 answer

Copying array from RAM to GPU and from GPU to RAM

I'm trying to introduce some CUDA optimizations in one of my projects. But I think I'm doing something wrong here. I want to implement a simple matrix-vector multiplication (result = matrix * vector). But when I want to copy the result back to the…
alfa
  • 3,058
  • 3
  • 25
  • 36
1
vote
1 answer

Maximum (shared memory per block) / (threads per block) in CUDA with 100% MP load

I'm trying to process array of big structures with CUDA 2.0 (NVIDIA 590). I'd like to use shared memory for it. I've experimented with CUDA occupancy calculator, trying to allocate maximum shared memory per thread, so that each thread can process…
mirror2image
  • 290
  • 2
  • 7
1
vote
1 answer

What is the difference between the OpenCL functions length() and fast_length()?

On page three of this OpenCL reference sheet (broken link) there are two built in vector length functions with identical parameters: length() and half_length(). What is the difference between these functions? I gather from the name one is 'faster'…
sebf
  • 2,831
  • 5
  • 32
  • 50
1
vote
1 answer

cuda on integrated gpu + external device

I have a dell desktop pc which has an integrated gpu. If I add one more gpu over PCIe will I be able to run cuda? Probably yes. The integrated gpu has its own driver (i915) and I am not sure what will happen with nvidia driver (for the second gpu)…
amanda
  • 394
  • 1
  • 9
1
vote
1 answer

Passing GPUArray to feval

I have the following kernel __global__ void func( float * arr, int N ) { int rtid = blockDim.x * blockIdx.x + threadIdx.x; if( rtid < N ) { float* row = (float*)((char*)arr + rtid*N*sizeof(float) ); for (int c = 1; c…
VIHARRI PLV
  • 587
  • 1
  • 5
  • 7
1
vote
0 answers

Profiler shows OpenCL not uses all registers available

Here is the copy of occupancy analysis of my kernel from the NVIDIA Compute Visual Profiler: Kernel details : Grid size: 300 x 1, Block size: 224 x 1 x 1 Register Ratio = 0.75 ( 24576 / 32768 ) [48 registers per thread] Shared Memory Ratio =…
altair211
  • 97
  • 1
  • 8
0
votes
2 answers

Advanced Encryption Standard on GPU using CUDA

I am a CUDA developer, I am assisting undergrad students in implementing AES on GPU. They don't have much knowledge about cryptography also this is the first time I am working on it. I have a few questions if anyone could answer them. How do we…
Bilal
  • 25
  • 5
0
votes
2 answers

OpenCL- waste of host computing power

I am new to OpenCL, please tell me that the host cpu can be used only for allocating memory to the device, or we can use it can as an openCL device. (Because after the allocation is done, the host cpu will be idle).
0
votes
2 answers

Information on current GPU Architectures

I have decided that my bachelors thesis will be about general purpose GPU-computing and which problems are more suitable for this than others. I am also trying to find out if there are any major differences between the current GPU architectures that…
vichle
  • 2,499
  • 1
  • 15
  • 17
0
votes
1 answer

OpenHMPP in GCC

The gist of the question is: Do you know any projects that aim to bring OpenHMPP support to GCC? I could also possibly live with affordable commercial compilers, but it's very unlikely, because I prefer Linux, and I would like the compiler to…
enobayram
  • 4,650
  • 23
  • 36
0
votes
1 answer

"cast" GL_R8 to GL_BGRA

I'm doing some GPGPU programming with OpenGL. I want to be able to write all my data to one-dimensional textures with the format GL_R8, so that I basically can treat it as an std:array object. Then during rendering I would like to be able to set…
ronag
  • 49,529
  • 25
  • 126
  • 221
0
votes
2 answers

Trouble with CUDA Memory Allocation and Access

I am working on learning CUDA right now. I have some basic experience with MPI so I figured I'd start with some really simple vector operations. I am trying to write a parallelized dot product thing. I am either having trouble allocating/writing…
Joe
  • 320
  • 1
  • 4
  • 15
0
votes
1 answer

Advice needed regarding GPGPU library

I am writing an application and eventually it comes to well parallelisable part: two dimensional float initialData and result arrays for each cell (a, b) in result array: for each cell (i, j) in initialData: result(a, b) +=…
user380041
1 2 3
99
100