-1

Is there a way to automatically convert a code that has been written to do FP32 calculations on a FP32 GPU, so it can do always FP16 calculations instead of FP32?

What I'm trying to achieve is to run a code for an old GPU(that doesn't support HALF Type), to run on a newer GPU that does... but without going through the code myself...

If not possible, show me some a light on what documentations should I read, to do it myself...

(new GPU is Radeon Vega Frontier, driver is ROCm 1.9.1, OS is Ubuntu 18.04) (the code is extensive and composed by different modules... so I won't be posting it here, unless asked to)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
JosEvora
  • 134
  • 1
  • 9
  • 2
    The YOLO approach is a replace-all of "float" with "half", but I suspect that might not be 100% foolproof. – pmdj Nov 14 '18 at 12:37
  • Not many float types in the code really, many int tho, I've tried to change them all to short... But again, not clean and errors came along... So I need to redo the whole code... – JosEvora Nov 15 '18 at 01:57
  • If there isn't much float maths, hardware FP16 support is going to have pretty limited effect. For optimising integer code, going through all uint/uint and int/int multiplications and checking if it's safe to replace them with [`mul24`](https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/mul24.html) or even [`mad24`](https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/mad24.html) calls can make a big difference. I'm not sure how AMD hardware performs on short multiplications versus mul24, they may or may not be even faster. – pmdj Nov 15 '18 at 18:37

1 Answers1

1

No, there is no standard flag to say "treat float as half". You have to change "float" to "half". Also, your device must support fp16 calculations (many don't, just fp16 storage that converts to/from fp32 when you load/store). The cl_khr_fp16 extension adds support for half scalar and vector types as built-in types that can be used for arithmetic operations. You'll need a #prama in any kernels that use it.

Dithermaster
  • 6,223
  • 1
  • 12
  • 20
  • "your device must support fp16 calculations (many don't, just fp16 storage that converts to/from fp32 when you load/store)." that's what I'm saying, it does support something called RPM Rapid Packed Math, that is 16bit calculations, when input types are 16bit in size... – JosEvora Nov 15 '18 at 01:52
  • "cl_khr_fp16 extension adds support for half scalar and vector types as built-in types that can be used for arithmetic operations" I know that, it's already there... Made no difference what so ever... – JosEvora Nov 15 '18 at 01:54