Questions tagged [half-precision-float]

half-precision 16-bit floating point

Most uses of 16-bit floating point are the binary16 aka half-precision floating point format, but other formats with different choices of exponent vs. significand bits are possible.

(However, related formats like Posit that have similar uses but a different binary format are not covered by this tag)

The tag wiki has links to more info, and lists other tags. (This tag was temporarily a synonym of , but should stay separate because half-precision is less widely implemented than float / binary32 and double / binary64.)


16-bit floating point has less precision (mantissa aka significand bits) and less range (exponent bits) than the widely used 32-bit single-precision IEEE754 binary32 float or 64-bit binary64 double. But it takes less space, reducing memory bandwidth requirements, and on some GPUs has better throughput.

It's fairly widely supported on GPUs, but on x86 CPUs at least, support is limited to conversion to/from float. (And only on CPUs that support AVX and the F16C extension, e.g. Intel starting with IvyBridge.)

If any CPU SIMD extension supported math on half-precision directly, it would have twice the elements per SIMD vector and thus twice the throughput of float for vectorizable tasks. But such support is not widespread in 2020 if it exists at all.

70 questions
0
votes
1 answer

Using half precision with CuPy

I am trying to compile a simple CUDA kernel with CuPy using the half precision format provided by the cuda_fp16 header file. My kernel looks like this: code = r''' extern "C" { #include __global__ void kernel(half * const f1, half *…
0
votes
0 answers

16-bit floating point division (half-precision)?

how can I divide a 16-bit float point number by a 16-bit float point number (half-precision)? I did the sign with XOR gate, the exponent with 5bit subtractor, but couldn't do the mantissa. how can I do the normalizing and rounding? logisim
Arthur
  • 1
  • 1
0
votes
0 answers

Deviation caused by half() in ptyroch

I have met a question that the value of a tensor is 6.3982e-2 in float32. After I changed it to float16 using half() function, it became 6.3965e-2. Will there be a method to convert tensor without deviation?
zhangbw
  • 11
  • 4
0
votes
0 answers

How do I know that Tensor Cores used in PyTorch (for FP16, bFloat16, INT8)?

From PyTorch documentation it is very to know if a model is using Tensor Cores or not (for FP16, bFloat16, INT8)?. What I know so far: FP32 will not run on Tensor Cores, since it is not supported Enabling TF32 for PyTorch will run your model in…
michelvl92
  • 61
  • 6
0
votes
1 answer

atomicAdd half-precision floating-point (FP16) on CUDA Compute Capability 5.2

I am trying to atomically add a float value to a __half in CUDA 5.2. This architecture does support the __half data type and its conversion functions, but it does not include any arithmetic and atomic operations for halves, like atomicAdd(). I…
Skip
  • 30
  • 2
  • 6
0
votes
0 answers

Is there a reason why a nan value appears when there is no nan value in the model parameter?

I want to train the model with FP32 and perform inference with FP16. For other networks (ResNet) with FP16, it worked. But EDSR (super resolution) with FP16 did not work. The differences I found are that ReLU with inplace=True in EDSR PixelShuffle…
SIwoo Lee
  • 3
  • 2
0
votes
1 answer

Can float16 data type save compute cycles while computing transcendental functions?

it's clearly that float16 can save bandwidth, but is float16 can save compute cycles while computing transcendental functions, like exp()?
Leonardo Physh
  • 1,443
  • 2
  • 9
  • 12
0
votes
1 answer

What are vector division and multiplication as in CUDA __half2 arithmetic?

__device__​ __half2 __h2div ( const __half2 a, const __half2 b ) Description: Divides half2 input vector a by input vector b in round-to-nearest mode. __device__​ __half2 __hmul2 ( const __half2 a, const __half2 b ) Description: Performs half2…
Aryan
  • 430
  • 4
  • 12
0
votes
0 answers

How to round up or down when converting f32 to bf16 in rust?

I am converting from f32 to bf16 in rust, and want to control the direction of the rounding error. Is there an easy way to do this? Converting using the standard bf16::to_f32 rounds to the nearest bf16 value, which can be either larger or smaller…
Amir
  • 888
  • 9
  • 18
0
votes
1 answer

Convert 16 bit hex value to FP16 in Python?

I'm trying to write a basic FP16 based calculator in python to help me debug some hardware. Can't seem to find how to convert 16b hex values unto floating point values I can use in my code to do the math. I see lots of online references to numpy but…
ajcrm125
  • 323
  • 2
  • 12
0
votes
2 answers

Why is it dangerous to convert integers to float16?

I have run recently into a surprising and annoying bug in which I converted an integer into a float16 and the value changed: >>> import numpy as np >>> np.array([2049]).astype(np.float16) array([2048.], dtype=float16) >>>…
guhur
  • 2,500
  • 1
  • 23
  • 33
0
votes
2 answers

Bit shifting a half-float into a float

I have no choice but to read in 2 bytes that make up a half-float. I would like to work with this in the form of a 4 byte float. Ive done some research and the only thing I can come up with is bit shifting. My only issues is that I dont fully…
0
votes
1 answer

Declaring Half precision floating point memory in SYCL

I would like to know and understand how can one declare half-precision buffers and pointers in SYCL namely in the following ways - Via the buffer class. Using malloc_device() function. Also, suppose I have an existing fp32 matrix / array on the…
Atharva Dubey
  • 832
  • 1
  • 8
  • 25
0
votes
1 answer

Convert ieee754 half-precision bytes to double and vise versa in Flutter

I have a device that provides temperature data in ieee754 half-precision float format, i.e. [78, 100] = +25.5C. Now, Dart/Flutter doesn't support HP-Float conversions out of the box. After googling around, I have found several solutions that I was…
Pavel
  • 565
  • 6
  • 19
0
votes
0 answers

How to Initialise 16-bit Half Floats (GAS for ARM32)?

When writing an ARM assembly program one can use data type directives to initialise some values. For example, in the example below we are initializing a single float: label: .single 0.0 However, when storage space is a matter, on the ARM platform…