I'm wondering if I can use SIMD intrinsics in a GPU code like a CUDA's kernel or openCL one. Is that possible?
4 Answers
No, SIMD intrinsics are just tiny wrappers for ASM code. They are CPU specific. More about them here.
Generally speking, why whould you do that? CUDA and OpenCL already contain many "functions" which are actually "GPU intrinsics" (all of these, for example, are single-point-math intrinsics for the GPU)

- 7,638
- 3
- 37
- 77
-
1Also, just to be precise, CUDA inherently uses SIMD already. You write code that runs on many threads simultaneously in lockstep, so a single instruction operates on multiple data values at once, each being processed in the context of a different thread. – Jason R Feb 19 '13 at 14:17
You use the vector data types built into the OpenCL C language. For example float4 or float8. If you run with the Intel or AMD device drivers these should get converted to SSE/AVX instructions of the vendor's OpenCL device driver. OpenCL includes several functions such as dot(v1, v2) which should use the SSE/AVX dot production instructions. Is there a particular intrinsic you are interested in that you don't think you can get from the OpenCL C language?
Mostly no, because GPU programming languages use different programming model (SIMT). However, AMD GPU do have an extension to OpenCL which provides intrinsics for some byte-granularity operations (thus allowing to pack 4 values into 32-bit GPU registers). These operations are intended for video processing.

- 11,993
- 4
- 27
- 41
-
3Some SIMD-in-a-word functions that operate on four bytes or two half-words respectively were recently posted on NVIDIA's registered developer website: https://devtalk.nvidia.com/default/topic/528624/announcements/new-download-simd-in-a-word-functions/ – njuffa Feb 19 '13 at 18:39
-
Version 1.1 of the SIMD-in-a-word functions has been posted. This release doubles the number of supported operations: https://devtalk.nvidia.com/default/topic/535684/announcements/release-1-1-of-simd-in-a-word-functions-posted/ – njuffa Mar 30 '13 at 00:27
Yes you can use SIMD intrinsics in the kernel code on CPU or GPU provided the compiler supports usage of these intrinsics.
Usually the better way to use SIMD will be using the Vector datatypes in the kernels so that the compiler decides to use SIMD based on the availablility, this make the kernel code portable as well.

- 579
- 1
- 5
- 13