My current question is exetension of previous SIMD-8,SIMD-16 or SIMD-32 in opencl on gpgpu question.
I understand the concept of SIMD programming on GPU. It says all the scalar instructions on different work items are executed together in a warp/SIMD width group/Wavefront. My understanding here is that if we write a packed vector instruction in kernel code, compiler converts that instruction into scalars. And while execution all the work items in a simd width group execute the same instruction.
1) Now if we use a builtin like mad provided by opencl how this will be executed on the gpu ? Will all the work-items execute this as mad or this will be turned into scalar first?
2) If mad is being executed on the all workitems will the SIMD width get reduce from 32 to 16 or 16 - 8 ?