Questions tagged [half-precision-float]

half-precision 16-bit floating point

Most uses of 16-bit floating point are the binary16 aka half-precision floating point format, but other formats with different choices of exponent vs. significand bits are possible.

(However, related formats like Posit that have similar uses but a different binary format are not covered by this tag)

The tag wiki has links to more info, and lists other tags. (This tag was temporarily a synonym of , but should stay separate because half-precision is less widely implemented than float / binary32 and double / binary64.)


16-bit floating point has less precision (mantissa aka significand bits) and less range (exponent bits) than the widely used 32-bit single-precision IEEE754 binary32 float or 64-bit binary64 double. But it takes less space, reducing memory bandwidth requirements, and on some GPUs has better throughput.

It's fairly widely supported on GPUs, but on x86 CPUs at least, support is limited to conversion to/from float. (And only on CPUs that support AVX and the F16C extension, e.g. Intel starting with IvyBridge.)

If any CPU SIMD extension supported math on half-precision directly, it would have twice the elements per SIMD vector and thus twice the throughput of float for vectorizable tasks. But such support is not widespread in 2020 if it exists at all.

70 questions
0
votes
0 answers

converting Golang float32 to half-precision float (GLSL float16) as uint16

I need to pass some data over from Go to an '300 es' shader. The data consists of two uint16s packed into a uint32. Each uint16 represents a half-precision float (float16). I found some PD Java code that looks like it will do the job, but I am…
Peter
  • 398
  • 3
  • 20
0
votes
1 answer

IEEE-754 Standard

I have an actually very easy question about the IEEE-754 standard in which numbers are coded and saved on the computer. At uni (exams) I have come across the following definition for 16-bit IEEE-754-format (half precision): 1 sign bit, 6 exponent…
Okyanus
  • 3
  • 3
0
votes
1 answer

Converting Float32 to Float16

This is more a follow up to https://stackoverflow.com/a/5587983/13586005. @sam hocevar or anybody else who understands this: Would you mind explaining what is happening here: tmp = (tmp - 0x70) & ((unsigned int)((int)(0x70 - tmp) >> 4) >> 27); I'm…
0
votes
1 answer

Incomplete Cholesky Factorization Very Slow

Background: I'm doing a project for my Numerical Linear Algebra course. For this project I decided to experiment with doing incomplete cholesky factorization with half precision arithmetic and using the result as a preconditioner for iterative…
Onye
  • 195
  • 1
  • 7
0
votes
1 answer

Training with Keras/TensorFlow in fp16 / half-precision for RTX cards

I just got an RTX 2070 Super and I'd like to try out half precision training using Keras with TensorFlow back end. So far I have found articles like this one that suggest using this settings: import keras.backend as…
Eduardo G.R.
  • 377
  • 3
  • 18
0
votes
1 answer

Half-precision floating-point

I have a small question about Half-precision IEEE-754. 1) I have the following exercise: 13,7625 shall be written in 16 bit (half precision) so I started to convert the number from DEC to Binary and I got this 13,7625 = 1101.11000011002 all in…
StudentAccount4
  • 186
  • 1
  • 11
0
votes
0 answers

How to obtain the half precision floating point representation of a number?

I want to obtain the binary representation of a variable x in half precision floating point representation. x can be anything (like -1.25 or 10 etc). I have tried quite a few things but can't really get this right. I have tried making my own…
Black Jack 21
  • 315
  • 4
  • 19
0
votes
1 answer

How do we minimize precision error with FP16 half precision floating point numbers

I have one example 50.33123 can be stored in FP32(1.8.23) format as 0x4249532E . If we convert this to binary 0100 0010 0100 1001 0101 0011 0010 1110 First bit is sign bit, which is 0 means positive number, Next 8 bits are exponent -> 1000 01002…
sathyarokz
  • 13
  • 6
-1
votes
1 answer

List of ARM instructions implementing half-precision floating-point arithmetic

Arm Architecture Reference Manual for A-profile architecture (emphasis added): FPHP, bits [27:24] 0b0011 As for 0b0010, and adds support for half-precision floating-point arithmetic. A simple question: where is to find a list of ARM instructions…
pmor
  • 5,392
  • 4
  • 17
  • 36
-1
votes
1 answer

How to Convert OpenCL code from FP32 to FP16?

Is there a way to automatically convert a code that has been written to do FP32 calculations on a FP32 GPU, so it can do always FP16 calculations instead of FP32? What I'm trying to achieve is to run a code for an old GPU(that doesn't support HALF…
JosEvora
  • 134
  • 1
  • 9
1 2 3 4
5