Reduce mantissa bit-width

Question

Well, I feel embarrassed I cannot get this by my own, but..
How can I reduce the mantissa (and exponent) bit-with for a floating point number?

I am training a (convolutional) artificial neural network (and I'm implementing it on FPGA) and I'd like to study the relation between mantissa (and exponent) bit-width vs. testing (and training) accuracy on CPU (and GPU). Next step would be converting my floats into a fixed point representation (that is what I am using on FPGA) and see how stuff goes.

Similar kind of studies have been already done by others ([Tong, Rutenbar and Nagle (1998)] and [Leeser and Zhao (2003)]), so there should be a way of doing this, although the 'how' is not yet clear to me.

Last point, I'm programming in Lua, but I can easily include C stuff with ffi of LuaJIT.

Eric Postpischil · Accepted Answer · 2013-10-17T16:08:45.773

To remove s bits from the significand of a binary floating-point number x and round the remaining bits, use Veltkamp’s algorithm:

Let factor = 2**s + 1.
Let c = factor * x.
Let y = c - (c-x).

Each operation above should be computed with floating-point arithmetic, including rounding-to-nearest with the same precision as x. Then y is the desired result.

Note that this will round a single number to a shorter significand. It will not generally reproduce the results of computing with shorter significands. E.g., given a and b, computing a•b with greater precision and then rounding to lesser precision will not always have the same result as computing a•b with the final precision.

To decrease the exponent range, you can merely compare a value to thresholds for the new exponent range and declare underflow or overflow as appropriate.

Thank you very much. This will be good to take in consideration in the future, since now I've just to truncate my float at a specific bit after the point (number:mul(2^b):floor():div(2^b)) since we are still using static fixed point numbers. Next year we might switch to dynamic fixed point, and then I will be tackling the mantissa :) — Atcold, Oct 19 '13 at 00:43

Reduce mantissa bit-width

1 Answers1