2

Assume we have two floating point values: 1.23 and 4.56. To represent and add these in a machine without floating point support, we will have to fall back to fixed point representation.

So we pick the number 100 as a scaling factor, simply to get rid if the decimal points:

1 - Multiple them by scaling factor => 123 and 456

2 - Add them 123 + 456 = 579

3- Divide it by the same scaling factor => 5.79

Which is equal to the floating point add 1.23 + 4.56 = 5.79

Now, why do I keep reading on online articles that scaling factor tends to be a power of two?

https://en.wikipedia.org/wiki/Scale_factor_(computer_science)

https://www.allaboutcircuits.com/technical-articles/fixed-point-representation-the-q-format-and-addition-examples/

If I choose say 2^5 = 32 as my scaling factor then we have:

-> 1.23 * 32 = 39.36 ~= 39
-> 4.56 * 32 = 145.92 ~= 145
-> 39 + 149 = 188 
-> 188 / 32 = 5.87

The output of 5.87 is not even precise. So why do we pick power of 2? Why don't we just pick a power of 10 as the factor?

Edit

I have also seen in such posts: https://spin.atomicobject.com/2012/03/15/simple-fixed-point-math/

That power of two is chosen since computers can represent them fast, i.e 2^16 can be done with bit shifting : 1 << 16, but power of 10 can't be computed as fast.

So is that it? we basically destroy precision for a bit of latency (if at all)?

Dan
  • 2,694
  • 1
  • 6
  • 19
  • 1
    Think about it this way: instead of 1 + 2/10 + 3/100, you're dealing with 1 + 1/2 + 1/8. – Mad Physicist Jun 01 '21 at 04:07
  • Using binary versus decimal does not “destroy precision”. Accuracy is lost in the examples you gave for two reasons: One, in the binary sample, you used a much smaller scaling factor (32) than you did in the decimal sample (100). Two, the numbers in your sample were exactly representable in decimal. Such decimal numbers are common only where humans have already rounded numbers to decimal or created them that way. They do not occur frequently in nature—⅓ is not exactly representable in decimal, masses and speeds of objects are never or almost never exactly decimal numbers of grams or m/s. – Eric Postpischil Jun 01 '21 at 10:34
  • Using decimal fixed-point for the general distribution of numbers that occur in mathematics and physics will have rounding errors just as binary fixed-point will. – Eric Postpischil Jun 01 '21 at 10:36
  • @EricPostpischil thanks, so essentially, the reason power of 2 is chosen in binary is because calculation becomes faster using bit shifting? – Dan Jun 01 '21 at 12:13
  • @Dan: Yes. I would post an answer to that effect but have to go out now and would want to ponder whether there are additional reasons. – Eric Postpischil Jun 01 '21 at 12:25
  • @EricPostpischil How do you choose the scale factor? is it essentially "the more bits the better"? – Dan Jun 01 '21 at 14:53
  • @Dan note that of all the operations you mention, the division, `188 / 32`, is the most costly by far. When the value is known to be non-negative, this can be replaced with a shift operation, `188 >> 5`, which is much cheaper. This is a benefit of using the native base. But particularly in finance, the precision of decimal scaling is more important. (See decimal fixed-point example [here](https://johnmcfarlane.github.io/cnl#division).) – John McFarlane Jun 01 '21 at 23:46

1 Answers1

1

Which is equal to the floating point add 1.23 + 4.56 = 5.79

Not quite.

1.23, 4.56, 5.79 as source code are exactly representable. As floating-point encoded with binary64, they are not. Much like 0.3333 is not exactly one-third, IEE-754 binary uses nearby values - within 1 part in 253. Thus the addition may provide the expected sum, or maybe a very close other sum will occur.

why do I keep reading on online articles that scaling factor tends to be a power of two?

With binary floating point, scaling by powers of 2 injects no precision loss. The product is exactly as good as its pre-scaled value.

Why don't we just pick a power of 10 as the factor?

Scaling by powers of 10 works well on paper (classical math), yet with binary floating point, the product likely is not exact and instead a rounded value is used. Thus our scaling injects a little error.

So is that it? we basically destroy precision for a bit of latency (if at all)?

No, there are many more issues. Since there are so many issues and speed is important, manufacturers of floating point hardware need an incredibly specific IEEE-754. Even after 40 years, corner cases come up. For over the past 20 years a decimal version of IEEE-754 exist too. That portion of the overall spec is slowing getting realized in hardware instead of the slooooow software decimal floating point implementations. Until the marketplace drives for wider acceptance, binary floating point with its difference between classical math (1.23 + 4.56) will continue to dominate versus switching to decimal floating point.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Thanks, How do you choose the scale factor? is it essentially "the more bits the better"? Also assume we have N bits to use for showing a fixed point. Is it possible to show every decimal within that range using fixed point? or there are decimals which there is no way to show regardless of number of bits we have? I already know about, for example, one-tenth (0.1) and one-hundredth (0.01) can be represented only approximately. – Dan Jun 01 '21 at 16:39
  • @Dan How many bits to use?: need to know largest fixed point value to encode, and precision needed. e.g. (log2(max_value/precsision_needed) + 1 for sign). If not enough bits available, something has to give. – chux - Reinstate Monica Jun 01 '21 at 17:33
  • i see, on the second part of the question, can we show every decimal in a given range using fixed point? or there will always be number that can't be represented? – Dan Jun 01 '21 at 19:25
  • @Dan If the _fixed point_ is realized with an _integer_ type: yes, if with a binary FP type: no. – chux - Reinstate Monica Jun 01 '21 at 19:34
  • right, so representing decimals with fixed-point notation, using regular integers, will make some decimals unrepresentable. – Dan Jun 01 '21 at 19:51
  • @dan other way around. – chux - Reinstate Monica Jun 01 '21 at 20:08
  • So to confirm, representing decimals with floating-point notation will make some decimals unrepresentable. but with fixed point given a specific bit range, every decimal can be represented accurately? (expect 0.1, 0.01 ... i guess)? – Dan Jun 01 '21 at 21:55
  • 1
    @Dan 0.01 scaled by 100 and saved as a fixed point integer is 100. Scaling back, by /100, 0.01 can result - as text - exactly. 0.01 (perhaps text) scaled by 100 and saved as a binary FP is 100.0. Scaling back down (double/100) , the `double` quotient is about `0.0100000000000000002081...`. Certainly that rounds to a few decimals as `0.01` when printed - but the quotient is not exactly 0.01. – chux - Reinstate Monica Jun 01 '21 at 23:29