2

I read the OpenCL overview, and it states it is suitable for code that runs of CPUs, GPGPUs, DSPs, etc. However, from looking through the command reference, it seems to be all math and image type operations. I didn't see anything for say strings.

This makes me wonder what would you run on a CPU via OpenCL?

Further, I know OpenCL can be used to perform sorting on GPGPUs. But would one ever use it (or, for that matter, a current GPGPU) to perform string processing such as pattern matching, metaphone extraction, dictionary lookup, or anything else that requires the processing of arrays of strings.

EDIT I noticed that Intel's upcoming Ivy Bridge is touted as "OpenCL compliant" with reference to its graphics units. Does this infer that the CPU cores are not OpenCL compliant, or is there no such inference?

EDIT In the interests of non-debate and constructiveness, I would appreciate if anyone could point me to official references that would answer my question.

IamIC
  • 17,747
  • 20
  • 91
  • 154
  • Who is saying that Intel's Ivy Bridge GPU is "OpenCL compliant", and what does "OpenCL compliant" mean? Intel CPUs support OpenCL using the Intel OpenCL SDK. – vocaro Jan 30 '12 at 20:30

3 Answers3

1

No links, but I would assume this is because algorithms that use strings may do a lot of dynamic memory allocation and branching, both of which GPGPUs are not well-suited for. GPGPUs also have a lot in common with vector processing, so doing units of work with different sized blocks of memory (which a string algorithm will generally work on, you usually don't have a homogeneous group of strings), yields poorer performance and is hard to program.

GPUs were designed to do the same work, with little to no branching, on a homogeneous group of data (such as per-vector or per-pixel operations). Algorithms that can mimic this type of behavior are great on GPUs.

onit
  • 6,306
  • 3
  • 24
  • 31
  • That makes sense. Do you have any idea what OpenCL code one would run on a CPU? This seems like an impractical idea to me. – IamIC Jan 30 '12 at 15:36
  • My experience with GPGPUs has generally been with CUDA. From my understanding though, OpenCL is just trying to provide a platform that makes it easy to write multi-threaded code, regardless of the architecture. Some multi-threaded applications, such as a web server algorithm creating pages of dynamic content, may not be well-suited for GPGPUs for the reasons I stated in my post. Though I have no idea what type of projects are currently being targeted by OpenCL. – onit Jan 30 '12 at 15:40
  • 1
    OpenCL is not a way to do multi-threading. OpenCL has work items that are all supposed to do the same computation on different pieces of data, but can't communicate and they probably aren't all running at the same time. – Steve Blackwell Jan 30 '12 at 23:06
  • @SteveBlackwell Perhaps multi-threading was the wrong word to use, because it may imply a traditional CPU threading model. From what I've seen of OpenCL, it copies lots of CUDA's threading model, and the idea that each task is run on a separate thread in a kernel, which is what I meant. I could say it does multiprocessing to be more accurate, but personally I think that is confusing because CUDA refers to units of work as threads. From my understanding an OpenCL "work item" is the exact same thing as a CUDA thread. – onit Jan 30 '12 at 23:59
  • @onit The two concepts are pretty similar, but I think OpenCL's work items are a little more stripped down. CUDA can call a global barrier to sync all threads, for example, and OpenCL can't. CUDA also allows dynamic memory allocation in kernel, and OpenCL doesn't. So yeah, it's kind of just a matter of SIMD terminology (OpenCL _could have_ called them threads also), but I think those are some important distinctions when it comes to actually writing the kernels. – Steve Blackwell Jan 31 '12 at 16:37
  • 2
    @SteveBlackwell In CUDA there is no global barrier to sync all threads. You can sync all threads in a block, which will sync all threads running on the same processor (i.e. threads running on a block are all run on the same core on the GPU). However, syncing all threads in a kernel is impossible, unless you exit the kernel and sync externally which is possible in OpenCL as well. Also, dynamic memory allocation is a relatively new feature in CUDA, and may be implemented in OpenCL in the future, though on both platforms it would lead to poor performance. – onit Jan 31 '12 at 16:53
1

You can think of OpenCL as a combination of a runtime (for device discovery, queueing) and a C-based programming language. This programming language has native vector types and built-in functions and operations for doing all sorts fun stuff to these vectors. This is nice in that you can write a vectorized kernel in OpenCL, and it it the responsibility of the implementation to map that to the actual vector ISA of your hardware.

From this 4/2011 article, which might vanish:

There are two major CPU architectures out there, x86 and ARM, both of which should soon run OpenCL code.

If you write an OpenCL application that targets both of these architectures, you wouldn't have to worry about writing two versions, one SSE and one NEON. Just write OpenCL C and be done with it. Yes, I know. This assumes the vendor has done his job and written a solid implementation that fully utilizes the underlying ISA. But if he doesn't, complain!

In addition, some CL implementations offer auto-vectorization of scalar kernels, which are usually easier to write. A good auto-vectorizer would give you a solid performance increase for no effort. Since CL kernels are compiled "online," obtaining such a benefit wouldn't require shipping rebuilt code.

James
  • 2,373
  • 18
  • 16
1

This makes me wonder what would you run on a CPU via OpenCL?

I prefer to use ocl to offload work from the cpu to my graphics hardware. Sometimes there is a limitation with my video card, so I like having a backup kernel for cpu use. Such limitations can be memory size, memory bottleneck, low clock speed, or when the pci-e bus gets in the way.

I say I like using a separate kernel for cpu, because I think all kernels should be tweaked to run on their target hardware. I even like to have an openmp backup plan, as most algorithms I use get tested out in this manner ahead of time.

I suppose it is best practice to test out a gpu kernel on the cpu to make sure it runs as expected. If a user of your software has opencl installed, but only a cpu (or a low-end gpu) it's nice to be able to execute the same code on the different devices.

mfa
  • 5,017
  • 2
  • 23
  • 28