Questions tagged [gpgpu]

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)"

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)". The two biggest manufacturers of GPUs are NVIDIA and AMD, although Intel has recently been moving in this direction with the Haswell APUs . There are two popular frameworks for GPGPU - NVidia's CUDA, which is only supported on its own hardware, and OpenCL developed by the Khronos Group. The latter is a consortium including all of AMD, NVidia, Intel, Apple and others, but the OpenCL standard is only half-heartedly supported by NVidia - creating a partial reflection of the rivalry among GPU manufacturers in the rivalry of programming frameworks.

The attractiveness of using GPUs for other tasks largely stems from the parallel processing capabilities of many modern graphics cards. Some cards can have thousands of streams processing similar data at incredible rates.

In the past, CPUs first emulated threading/multiple data streams through interpolation of processing tasks. Over time, we gained multiple cores with multiple threads. Now video cards house a number of GPUs, hosting many more threads or streams than many CPUs, and extremely fast memory integrated together. This huge increase of threads in execution is achieved thanks to the technique SIMD which stands for Single Instruction Multiple Data. This makes an environment uniquely suited for heavy computational loads that are able to undergo parallelization. Furthermore this technique also marks one of main differences between GPUs and CPUs as they are doing best what they were designed for.

More information at http://en.wikipedia.org/wiki/GPGPU

2243 questions
1
vote
1 answer

CUDA Toolkit 4.1/4.2: nvcc Crashes with an Access Violation

I am developing a CUDA application for GTX 580 with Visual Studio 2010 Professional on Windows 7 64bit. My project builds fine with CUDA Toolkit 4.0, but nvcc crashes when I choose CUDA Toolkit 4.1 or 4.2 with the following error: 1> Stack dump: …
meriken2ch
  • 409
  • 5
  • 15
1
vote
1 answer

Which is better, the atomic's competition between: threads of the single Warp or threads of different Warps?

Which is better, the atomic's competition (concurrency) between threads of the single Warp or between threads of different Warps in one block? I think that when you access the shared memory is better when threads of one warp are competing with each…
Alex
  • 12,578
  • 15
  • 99
  • 195
1
vote
1 answer

about floating point operation

Recently, I have been making program (FDTD Operation) using the CUDA development environment, OS is Windows server 2008 , Graphic card is TeslaC2070, compiler is VS2010. This program calculates using single and double precision floating-point. I was…
오승택
  • 45
  • 1
  • 5
1
vote
1 answer

Java: Cast or reference multidimensional array into single dimensional array

I have a program written in Java which involves massive amount of multidimensional array. I am trying to parallelize it using JOCL (OpenCL), but multidimensional array has to be converted to single dimensional array before being passed to OpenCL.…
aaronqli
  • 790
  • 9
  • 26
1
vote
1 answer

clSetKernelArg changed arg_value from 16 to 140733193388048?

I'm delving into OpenCL by making a Matrix dot product implementation. I'm having a problem with getting my kernels to return the same values as my host. I have made an encapsulation function that allocates device memory, sets parameters to a…
user1509669
  • 233
  • 2
  • 7
1
vote
0 answers

Read/Write the registers on a GPU

Is it possible to read/wite from/to the registers on the GPU using OpenCL? I am using a NVIDIA GeForce 9400gt graphics card. Tried googling out but not much info out there. Can someone tell me if this possible and if yes, how?
Nike
  • 455
  • 1
  • 5
  • 16
1
vote
0 answers

Understanding Registers in OpenCL

I am a little confused regarding the usage of registers internally by OpenCL kernels. I am using -cl-nv-verbose to capture the register usage for my kernel. At the moment, my kernel is recording ptxas info: Used 4 registers for some code in the…
Omar Khan
  • 68
  • 6
1
vote
2 answers

How do i get started with CUDA development on UBUNTU 9.04?

How do i get started with CUDA development on Ubuntu 9.04? Are there any prebuilt binaries? Are the default accelerated drivers sufficient? My thought is to actually work with OpenCL but that seems to be hard to do right now so i thought that i…
Per Arneng
  • 2,100
  • 5
  • 21
  • 32
1
vote
3 answers

Shift vector in thrust

I'm looking at a project involving online (streaming) data. I want to work with a sliding window of that data. For example, say that I want to hold 10 values in my vector. When value 11 comes in, I want to drop value 1, shift everything over, and…
Noah
  • 567
  • 8
  • 19
1
vote
1 answer

Optimizing a threaded simultaneous check

I have a device function that checks a byte array using threads, each thread checking a different byte in the array for a certain value and returns bool true or false. How can I efficiently decide if all the checks have returned true or otherwise?
gamerx
  • 579
  • 5
  • 16
1
vote
2 answers

Perfect hashing for OpenCL

I have a set (static, known in compile time) of about 2 million values, 20 bytes each. What I need is a fast O(1) way to check if a given value is in this set. It seems that perfect hash function with a bit array is ideal for this, but I can't find…
aplavin
  • 2,199
  • 5
  • 32
  • 53
1
vote
2 answers

cudaStreamDestroy() does not synchronize/block?

I'm using CUDA 4.2 on a Quadro NVS 295 on a Win7 x64 machine. From the CUDA C Programming Manual I read this: "...Streams are released by calling cudaStreamDestroy(). for (int i = 0; i < 2; ++i) cudaStreamDestroy(stream[i]); cudaStreamDestroy()…
ACRay
  • 13
  • 1
  • 4
1
vote
1 answer

OpenCL producing QNaN on NVidia hardware

I'm programming in OpenCL using the C++ bindings. I have a problem where on NVidia hardware, my OpenCL code is spontaneously producing very large numbers, and then (on the next run) a "1.#QNaN". My code is pretty much a simple physics simulation…
Chaosed0
  • 949
  • 1
  • 10
  • 20
1
vote
3 answers

How to use shared memory between kernel launches in CUDA?

I want to use values in shared memory over multiple launches of the same kernel. Can I do that?
Amin
  • 371
  • 1
  • 2
  • 7
1
vote
1 answer

Image cross-correlation with Matlab GPGPU, indexing into 3d array

The problem I'm encountering is writing code such that the built-in features of Matlab's GPU programming will correctly divide data for parallel execution. Specifically, I'm sending N 'particle' images to the GPU's memory, organized in a 3-d array…
ejmunson
  • 11
  • 1
1 2 3
99
100