Questions tagged [gpgpu]

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)"

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)". The two biggest manufacturers of GPUs are NVIDIA and AMD, although Intel has recently been moving in this direction with the Haswell APUs . There are two popular frameworks for GPGPU - NVidia's CUDA, which is only supported on its own hardware, and OpenCL developed by the Khronos Group. The latter is a consortium including all of AMD, NVidia, Intel, Apple and others, but the OpenCL standard is only half-heartedly supported by NVidia - creating a partial reflection of the rivalry among GPU manufacturers in the rivalry of programming frameworks.

The attractiveness of using GPUs for other tasks largely stems from the parallel processing capabilities of many modern graphics cards. Some cards can have thousands of streams processing similar data at incredible rates.

In the past, CPUs first emulated threading/multiple data streams through interpolation of processing tasks. Over time, we gained multiple cores with multiple threads. Now video cards house a number of GPUs, hosting many more threads or streams than many CPUs, and extremely fast memory integrated together. This huge increase of threads in execution is achieved thanks to the technique SIMD which stands for Single Instruction Multiple Data. This makes an environment uniquely suited for heavy computational loads that are able to undergo parallelization. Furthermore this technique also marks one of main differences between GPUs and CPUs as they are doing best what they were designed for.

More information at http://en.wikipedia.org/wiki/GPGPU

2243 questions
11
votes
1 answer

Is "cudaMallocManaged" slower than "cudaMalloc"?

I downloaded CUDA 6.0 RC and tested the new unified memory by using "cudaMallocManaged" in my application.However, I found this kernel is slowed down. Using cudaMalloc followed by cudaMemcpy is faster (~0.56), compared to cudaMallocManaged…
Genutek
  • 387
  • 1
  • 5
  • 11
11
votes
1 answer

How to optimize OpenCL code for neighbors accessing?

Edit: Proposed solutions results are added at the end of the question. I'm starting to program with OpenCL, and I have created a naive implementation of my problem. The theory is: I have a 3D grid of elements, where each elements has a bunch of…
Alex
  • 1,449
  • 4
  • 18
  • 28
11
votes
1 answer

Modifying registry to increase GPU timeout, windows 7

Im trying to increase the timeout on the GPU from its default setting of 2 seconds to something a little longer. I found the following link but it appears its slightly different in windows 7 as i cant see anything mentioned in the webpage. Has…
Hans Rudel
  • 3,433
  • 5
  • 39
  • 62
11
votes
4 answers

How should a very simple Makefile look like for Cuda compiling under linux

I want to compile a very basic hello world level Cuda program under Linux. I have three files: the kernel: helloWorld.cu main method: helloWorld.cpp common header: helloWorld.h Could you write me a simple Makefile to compile this with nvcc and…
Vereb
  • 14,388
  • 2
  • 28
  • 30
11
votes
2 answers

Are NVIDIA's GPUs big-endian or little-endian?

I need to do a lot of bit-wise operations on GPUs, but cannot find any information regarding whether NVIDIA GPU hardware is big or little-endian.
user0002128
  • 2,785
  • 2
  • 23
  • 40
11
votes
2 answers

large integer addition with CUDA

I've been developing a cryptographic algorithm on the GPU and currently stuck with an algorithm to perform large integer addition. Large integers are represented in a usual way as a bunch of 32-bit words. For example, we can use one thread to add…
user1545642
11
votes
3 answers

GLSL - Does a dot product really only cost one cycle?

I've come across several situations where the claim is made that doing a dot product in GLSL will end up being run in one cycle. For example: Vertex and fragment processors operate on four-vectors, performing four-component instructions such as…
ultramiraculous
  • 1,062
  • 14
  • 21
10
votes
1 answer

OpenGL ES vs OpenCL vs RenderScript for Android Image Processing

I need to build an Image processing application for Android. Performance is the main requirement and I am looking to use gpu compute. I want to know which of the 3 libraries is best to use. I know OpenGL is primarily for graphics but also supports…
xSooDx
  • 493
  • 1
  • 5
  • 19
10
votes
2 answers

What are the real C++ language constructs supported by CUDA device code?

Appendix D of the 3.2 version of the CUDA documentation refers to C++ support in CUDA device code. It is clearly mentioned that CUDA supports "Classes for devices of compute capability 2.x". However, I'm working with devices of compute capability…
jopasserat
  • 5,721
  • 4
  • 31
  • 50
10
votes
1 answer

Shaders in place of GPGPU

I want to experiment with some GPGPU in first place. I could have chosen between 5 choices out there: OpenCL, CUDA, FireStream, Close to Metal, DirectCompute. Well not really after filtering them for my needs none suits :) I am using Radeon 3870HD,…
Raven
  • 4,783
  • 8
  • 44
  • 75
10
votes
2 answers

C# Bitmap GetPixel(), SetPixel() in GPU

I am using Cudafy as c# wrapper I need to get colour info InputBitmap0.GetPixel(x, y) of a bitmap and make an new bitmap for output . I need the following work to be done in GPU. IN CPU OutputBitmap.SetPixel(object_point_x, object_point_y,…
Md Sifatul Islam
  • 846
  • 10
  • 28
10
votes
6 answers

Is there any possibility to write GPU-applications using CUDA under F sharp?

I am interested in using F# for numerical computation. How can I access the GPU using NVIDIA's CUDA standart under F#?
Martin
  • 181
  • 1
  • 4
10
votes
1 answer

Working with many fixed-size matrices in CUDA kernels

I am looking to work about 4000 fixed-size (3x3, 4x4) matrices, doing things such as matrix inversion and eigendecomposition. It seems to me the best way to parallelize this would be to let each of the many GPU threads work on a single instance of…
Daniel
  • 510
  • 3
  • 15
10
votes
1 answer

What is the difference between the CUDA tookit and the CUDA sdk

I am installing CUDA on Ubuntu 14.04 and have a Maxwell card (GTX 9** series) and I think I have installed everything properly with the toolkit as I can compile my samples. However, I read that in places that I should install the SDK (This appears…
bubblebath
  • 939
  • 4
  • 18
  • 45
10
votes
1 answer

cudaDeviceSynchronize() waits to finish only in current CUDA context or in all contexts?

I use CUDA 6.5 and 4 x GPUs Kepler. I use multithreading, CUDA runtime API and access to the CUDA contexts from different CPU threads (by using OpenMP - but it does not really matter). When I call cudaDeviceSynchronize(); will it wait for kernel(s)…
Alex
  • 12,578
  • 15
  • 99
  • 195