2

In my OpenCL kernel I use 16bit floating point values of type half from the cl_khr_fp16 extension.

Although this gives me code that works well, I noticed with AMD's radeon developer tools that the reciprocal is computed in 32 bits (gpu target is gfx1102 RDNA3.)

Radeon GPU Analyzer

So the value is first converted from half precision to single precision, then the reciprocal is computed, and then the result is converted back into half precision.

This is despite having the division with both numerator and denominator in half precision.

I know that CUDA uses a function call for this: hrcp so I also tried the following OpenCL reciprocal functions half_recip() / native_recip() with the same results.

Is there a way to force OpenCL to compute the reciprocal without first converting?

Bram
  • 7,440
  • 3
  • 52
  • 94
  • Did you try ordinary division with a constant 1 as numerator? – Simon Goater May 04 '23 at 11:12
  • @SimonGoater yes, it is highlighted in the screenshot on the left. It even uses the correct 16b literal. – Bram May 04 '23 at 15:09
  • Is the type of rad_x half (maybe show/say that)? What are the semantics of (1.0h / ...)? Is there something similar to hrcp (an intrinsic)? I see that AMD does have v_rcp_f16 https://www.amd.com/system/files/TechDocs/rdna3-shader-instruction-set-architecture-feb-2023_0.pdf – Tim May 05 '23 at 18:29
  • What are compile flags? – huseyin tugrul buyukisik May 15 '23 at 16:17

0 Answers0