Floating Point Algorithms in C

Question

I am thinking recently on how floating point math works on computers and is hard for me understand all the tecnicals details behind the formulas. I would need to understand the basics of addition, subtraction, multiplication, division and remainder. With these I will be able to make trig functions and formulas.

I can guess something about it, but its a bit unclear. I know that a fixed point can be made by separating a 4 byte integer by a signal flag, a radix and a mantissa. With this we have a 1 bit flag, a 5 bits radix and a 10 bit mantissa. A word of 32 bits is perfect for a floating point value :)

To make an addition between two floats, I can simply try to add the two mantissas and add the carry to the 5 bits radix? This is a way to do floating point math (or fixed point math, to be true) or I am completely wrong?

All the explanations I saw use formulas, multiplications, etc. and they look so complex for a thing I guess, would be a bit more simple. I would need an explanation more directed to beginning programmers and less to mathematicians.

Why are you trying to *write* new floating-point functions, instead of simply using the functions from the standard math library? — Daniel Pryden, Jul 07 '10 at 21:56
I agree that's what he *should* do. However, there's certinaly nothing wrong with wanting to know how it works. There should be more folks like that. — T.E.D., Jul 07 '10 at 22:09
@Daniel Because he wants to learn? Essentially all of the exercises in "The C Programming Language" by K&R have you implement common unix utilities. — Tyler, Jul 07 '10 at 22:11
I want to learn what I can about the inner workings of microprocessors. I want to be an Engineer. I want to put these features in a tiny VM I have in mind. Would be very cool to see it working and computing things without the use of specialized hardware. — Leandro Jardim, Jul 07 '10 at 22:30

score 2 · Answer 1 · answered Jul 07 '10 at 22:28

2

See Anatomy of a floating point number

answered Jul 07 '10 at 22:28

John D. Cook

29,517
10
67
94

score 1 · Answer 2 · answered Jul 07 '10 at 22:06

1

Run, don't walk, to get Knuth's Seminumerical Algorithms which contains wonderful intuition and algorithms behind doing multiprecision and floating point arithmetic.

answered Jul 07 '10 at 22:06

Ira Baxter

93,541
22
172
341

score 1 · Accepted Answer · answered Jul 07 '10 at 22:16

The radix depends of the representation, if you use radix r=2 you can never change it, the number doesn't even have any data that tell you which radix have. I think you're wrong and you mean exponent.

To add two numbers in floating point you must make the exponent one equal to another by rotating the mantissa. One bit right means exponent+1, and one bit left means exponent -1, when you have the numbers with the same exponent then you can add them.

Value(x) = mantissa * radix ^ exponent

adding these two numbers

    101011 * 2 ^ 13
    001011 * 2 ^ 12

would be the same as adding:

    101011 * 2 ^ 13
    000101 * 2 ^ 13

After making exponent equal one to another you can operate. You also have to know if the representation has implicit bit, I mean, the most significant bit must be a 1, so usually, as in the iee standard its known to be there, but it isn't representated, although its used to operate.

I know this can be a bit confusing and I'm not the best teacher so any doubt you have, just ask.

I really meant exponent. Thanks :) Math is always a bit confusing, but seens right for me. Do not worry, you are a good teacher. :) — Leandro Jardim, Jul 07 '10 at 22:42
I seem to remember a long diatribe by Kahan about pre-IEEE 754 systems that use the same number of digits the format has for aligning the mantissas, and the bad properties such systems have. You certainly do not get correctly rounded results if you do this. I don't remember which one it is but it is one of the publications on http://www.cs.berkeley.edu/~wkahan/ — Pascal Cuoq, Apr 14 '11 at 07:57

Floating Point Algorithms in C

3 Answers3

Linked