Executing opencl built ins on gpu

Question

My current question is exetension of previous SIMD-8,SIMD-16 or SIMD-32 in opencl on gpgpu question.

I understand the concept of SIMD programming on GPU. It says all the scalar instructions on different work items are executed together in a warp/SIMD width group/Wavefront. My understanding here is that if we write a packed vector instruction in kernel code, compiler converts that instruction into scalars. And while execution all the work items in a simd width group execute the same instruction.

1) Now if we use a builtin like mad provided by opencl how this will be executed on the gpu ? Will all the work-items execute this as mad or this will be turned into scalar first?

2) If mad is being executed on the all workitems will the SIMD width get reduce from 32 to 16 or 16 - 8 ?

I question this premise " My understanding here is that if we write a packed vector instruction in kernel code, compiler converts that instruction into scalars". It may be true of some platforms but not all. — Tim Child, Sep 01 '15 at 16:30
OK.. If compiler does not convert it into scalar's what will be the behavior? Any understanding here — Manish Kumar, Sep 17 '15 at 03:37
See page 3 or the AMD GCN Whitepaper, it explains that with the AMD SIMD architecture the compiler bundles 4 or 5 independent instructions " AMD GPUs consisted of a number of SIMD engines, each of which consisted of up to 16 ALUs. Each ALU could execute bundles of 4 or 5 independent instructions" — Tim Child, Sep 17 '15 at 15:18

Executing opencl built ins on gpu

0 Answers0