Questions tagged [gpgpu]

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)"

GPGPU is an acronym for the field of computer science known as "General Purpose computing on the Graphics Processing Unit (GPU)". The two biggest manufacturers of GPUs are NVIDIA and AMD, although Intel has recently been moving in this direction with the Haswell APUs . There are two popular frameworks for GPGPU - NVidia's CUDA, which is only supported on its own hardware, and OpenCL developed by the Khronos Group. The latter is a consortium including all of AMD, NVidia, Intel, Apple and others, but the OpenCL standard is only half-heartedly supported by NVidia - creating a partial reflection of the rivalry among GPU manufacturers in the rivalry of programming frameworks.

The attractiveness of using GPUs for other tasks largely stems from the parallel processing capabilities of many modern graphics cards. Some cards can have thousands of streams processing similar data at incredible rates.

In the past, CPUs first emulated threading/multiple data streams through interpolation of processing tasks. Over time, we gained multiple cores with multiple threads. Now video cards house a number of GPUs, hosting many more threads or streams than many CPUs, and extremely fast memory integrated together. This huge increase of threads in execution is achieved thanks to the technique SIMD which stands for Single Instruction Multiple Data. This makes an environment uniquely suited for heavy computational loads that are able to undergo parallelization. Furthermore this technique also marks one of main differences between GPUs and CPUs as they are doing best what they were designed for.

More information at http://en.wikipedia.org/wiki/GPGPU

2243 questions

votes

1 answer

Branch predication on GPU

I have a question about branch predication in GPUs. As far as I know, in GPUs, they do predication with branches. For example I have a code like this: if (C) A else B so if A takes 40 cycles and B takes 50 cycles to finish execution, if assuming…

asked Jul 05 '11 at 11:55

Zk1001

2,033
4
19
36

votes

2 answers

Are GPU/CUDA cores SIMD ones?

Let's take the nVidia Fermi Compute Architecture. It says: The first Fermi based GPU, implemented with 3.0 billion transistors, features up to 512 CUDA cores. A CUDA core executes a floating point or integer instruction per clock for a thread. The…

cuda gpu gpgpu simd

asked Feb 02 '15 at 18:06

Marc Andreson

3,405
5
35
51

votes

1 answer

Why is the constant memory size limited in CUDA?

According to "CUDA C Programming Guide", a constant memory access benefits only if a multiprocessor constant cache is hit (Section 5.3.2.4)1. Otherwise there can be even more memory requests for a half-warp than in case of the coalesced global…

cuda gpgpu gpu-constant-memory

asked Apr 21 '12 at 05:10

AdelNick

votes

2 answers

GPU utilization 0% during TensorFlow retraining for poets

I am following instructions for TensorFlow Retraining for Poets. GPU utilization seemed low so I instrumented the retrain.py script per the instructions in Using GPU. The log verifies that the TF graph is being built on GPU. I am retraining for a…

python tensorflow gpu gpgpu pre-trained-model

asked Jun 03 '18 at 19:06

Lars Ericson

1,952
4
32
45

votes

4 answers

Import PGP public key by string

I want to import a PGP public key into my keychain in a script, but I don't want it to write the contents to a file. Right now my script does this: curl http://example.com/pgp-public-key -o /tmp/pgp && gpg --import /tmp/gpg How could I write this…

shell gpgpu pgp

asked Sep 09 '16 at 12:35

Paradoxis

4,471
7
32
66

votes

3 answers

Does Global Work Size Need to be Multiple of Work Group Size in OpenCL?

Hello: Does Global Work Size (Dimensions) Need to be Multiple of Work Group Size (Dimensions) in OpenCL? If so, is there a standard way of handling matrices not a multiple of the work group dimensions? I can think of two possibilities: Dynamically…

matrix gpu opencl gpgpu

asked Jun 30 '10 at 09:24

Junier

1,622
1
15
21

votes

1 answer

OpenMP 4.0 in GCC: offload to nVidia GPU

TL;DR - Does GCC (trunk) already support OpenMP 4.0 offloading to nVidia GPU? If so, what am I doing wrong? (description below). I'm running Ubuntu 14.04.2 LTS. I have checked out the most recent GCC trunk (dated 25 Mar 2015). I have installed the…

gcc cuda openmp gpgpu nvidia

asked Mar 27 '15 at 09:55

Marc Andreson

3,405
5
35
51

votes

8 answers

Feasibility of GPU as a CPU?

What do you think the future of GPU as a CPU initiatives like CUDA are? Do you think they are going to become mainstream and be the next adopted fad in the industry? Apple is building a new framework for using the GPU to do CPU tasks and there has…

cuda cpu gpu gpgpu

asked Aug 26 '08 at 14:14

AutomaticPixel

votes

1 answer

Are GPU shaders Turing complete

I understand that complete GPUs are behemoths of computing - including every step of calculation, and memory. So obviously a GPU can compute whatever we want - it's Turing complete. My question is in regard to a single shader on various GPUs…

shader gpu gpgpu computation-theory

asked Jul 04 '14 at 08:00

Trevor

1,858
4
21
28

votes

4 answers

Parallel GPU computing using OpenCV

I have an application that requires processing multiple images in parallel in order to maintain real-time speed. It is my understanding that I cannot call OpenCV's GPU functions in a multi-threaded fashion on a single CUDA device. I have tried an…

opencv parallel-processing cuda gpgpu

asked Jun 21 '12 at 15:25

mmccullo

votes

5 answers

How to check for GPU on CentOS Linux

It is suggested that on Linux, GPU be found with the command lspci | grep VGA. It works fine on Ubuntu but when I try to use the same on CentOS, it says lspci command is not found. How can I check for the GPU card on CentOS. And note that I'm not…

c linux gpu x86-64 gpgpu

asked Apr 25 '12 at 06:12

pythonic

20,589
43
136
219

votes

6 answers

Disassemble an OpenCL kernel?

I'm not sure if it's possible. I want to study OpenCL in-depth, so I was wondering if there is a tool to disassemble an compiled OpenCL kernel. For normal x86 executable, I can use objdump to get a disassembly view. Is there a similar tool for…

opencl gpu gpgpu disassembly

asked Jul 14 '11 at 06:25

Patrick

4,186
9
32
45

votes

1 answer

How to calculate pairwise distance matrix on the GPU

The bottleneck in my code is the area where I calculate a pairwise distance matrix. Since this is the slowest part by far, I have spent much time in speeding up my code. I have found many speedups using articles online, but the gains have been…

python gpgpu distance-matrix

asked Oct 09 '17 at 22:26

Paul Terwilliger

1,596
1
20
45

votes

2 answers

GPU Shared Memory Bank Conflict

I am trying to understand how bank conflicts take place. I have an array of size 256 in global memory and I have 256 threads in a single block, and I want to copy the array to shared memory. Therefore every thread copies one…

c++ cuda gpgpu gpu-shared-memory bank-conflict

asked Dec 09 '10 at 08:22

scatman

14,109
22
70
93

votes

1 answer

How do I use Nvidia Multi-process Service (MPS) to run multiple non-MPI CUDA applications?

Can I run non-MPI CUDA applications concurrently on NVIDIA Kepler GPUs with MPS? I'd like to do this because my applications cannot fully utilize the GPU, so I want them to co-run together. Is there any code example to do this?

cuda gpu gpgpu nvidia kepler

asked Jan 10 '16 at 19:18

dalibocai

2,289
5
29
45

Prev 1 2 3

…

99 100 Next