Questions tagged [opencl]

OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors.

This tag refers to the OpenCL (Open Computing Language) by Khronos Group. It is the first open, royalty-free standard for cross-platform, parallel programming of modern processors found in personal computers, servers and handheld/embedded devices. Using OpenCL, one can affect execution of parallel computations greatly improving speed and responsiveness of a wide spectrum of applications: From gaming and entertainment to scientific and medical software.

OpenCL is an API and a C99-like language; for each device, implementations are provider-specific. Some of the OpenCL implementation providers are:

Questions about OpenCL can be asked here along with the vendor/provider and architecture details. Bug reports should be discussed in the respective forums of the vendors NVIDIA Forums, Intel Forums, AMD Forums

Books

5705 questions
21
votes
2 answers

Installing additional files with CMake

I am attempting to supply some "source" files with some executables. I was wondering if there was a way to copy these source files to the build directory (From the source directory) then to the install directory using CMake. My more specific goal…
Constantin
  • 16,812
  • 9
  • 34
  • 52
20
votes
5 answers

OpenCL vs. DirectCompute?

I'm looking for comparisons between OpenCL and DirectCompute, but I haven't found anything. OpenCL's advantages of being cross-platform and having a wider range of supported GPUs don't matter to me. I'm fine with coding on Windows against DX11…
royco
  • 5,409
  • 13
  • 60
  • 84
20
votes
1 answer

How to use pinned memory / mapped memory in OpenCL

In order to reduce the transfer time from host to device for my application, I want to use pinned memory. NVIDIA's best practices guide proposes mapping buffers and writing the data using the following code: cDataIn = (unsigned…
krisg
  • 271
  • 2
  • 11
19
votes
3 answers

How to declare local memory in OpenCL?

I'm running the OpenCL kernel below with a two-dimensional global work size of 1000000 x 100 and a local work size of 1 x 100. __kernel void myKernel( const int length, const int height, and a bunch of other parameters) { …
user1111929
  • 6,050
  • 9
  • 43
  • 73
19
votes
1 answer

How many threads (or work-item) can run at the same time?

I'm new in GPGPU programming and I'm working with NVIDIA implementation of OpenCL. My question was how to compute the limit of a GPU device (in number of threads). From what I understood a there are a number of work-group (equivalent of blocks in…
Laure Jonchery
  • 266
  • 1
  • 3
  • 5
19
votes
4 answers

Why aren't there bank conflicts in global memory for Cuda/OpenCL?

One thing I haven't figured out and google isn't helping me, is why is it possible to have bank conflicts with shared memory, but not in global memory? Can there be bank conflicts with registers? UPDATE Wow I really appreciate the two answers from…
smuggledPancakes
  • 9,881
  • 20
  • 74
  • 113
19
votes
5 answers

What is the difference between creating a buffer object with clCreateBuffer + CL_MEM_COPY_HOST_PTR vs. clCreateBuffer + clEnqueueWriteBuffer?

I have seen both versions in tutorials, but I could not find out, what their advantages and disadvantages are. Which one is the proper one? cl_mem input = clCreateBuffer(context,CL_MEM_READ_ONLY,sizeof(float) * DATA_SIZE, NULL,…
Framester
  • 33,341
  • 51
  • 130
  • 192
19
votes
3 answers

Is it possible to access hard disk directly from gpu?

Is it possible to access hard disk/ flash disk directly from GPU (CUDA/openCL) and load/store content directly from the GPU's memory ? I am trying to avoid copying stuff from disk to memory and then copying it over to GPU's memory. I read about…
L Lawliet
  • 2,565
  • 4
  • 26
  • 35
19
votes
3 answers

Convenient way to show OpenCL error codes?

As per title, is there a convenient way to show readable OpenCL error codes? Being able to convert codes like '-1000' to a name would save a lot of time browsing through error codes.
Selmar
  • 722
  • 1
  • 5
  • 12
18
votes
1 answer

The variation of cache misses in GPU

I have been toying an OpenCL kernel that access 7 global memory buffers, do something on the values and store the result back to a 8th global memory buffer. As I observed, as the input size increases, the L1 cache miss ratio (=misses(misses + hits))…
Zk1001
  • 2,033
  • 4
  • 19
  • 36
18
votes
3 answers

"Unrolling" a recursive function?

I'm writing a path tracer in C++ and I'd like to try and implement the most resource-intensive code into CUDA or OpenCL (I'm not sure which one to pick). I've heard that my graphics card's version of CUDA doesn't support recursion, which is…
Blender
  • 289,723
  • 53
  • 439
  • 496
18
votes
5 answers

When to use OpenCL?

Having stumbled over this forum thread, dot product faster on cpu than on gpu using OpenCL, I was reminded again, that there are instances, which look like they're made for OpenCL*, but where they're used, OpenCL does not provided us with a gain.…
Framester
  • 33,341
  • 51
  • 130
  • 192
18
votes
5 answers

List of OpenCL compliant CPU/GPU

How can I know which CPU can be programmed by OpenCL? For example, the Pentium E5200. Is there a way to know w/o running and querying it?
Lior Dagan
  • 189
  • 1
  • 1
  • 3
18
votes
4 answers

Is it fair to compare SSE/AVX units to GPU cores?

I have a presentation to make to people who have (almost) no clue of how a GPU works. I think saying that a GPU has a thousand cores where a CPU only has four to eight of them is a non-sense. But I want to give my audience an element of…
Simon
  • 860
  • 7
  • 23
17
votes
2 answers

OpenCL - is it possible to invoke another function from within a kernel?

I am following along with a tutorial located here: http://opencl.codeplex.com/wikipage?title=OpenCL%20Tutorials%20-%201 The kernel they have listed is this, which computes the sum of two numbers and stores it in the output variable: __kernel void…
Adam S
  • 8,945
  • 17
  • 67
  • 103