This algorithm is a fixed-point division of two unsigned 16-bit fractional values in [0,1). It first computes an initial 9-bit approximation to the reciprocal of the divisor via a table lookup, refines this with a single Newton-Raphson iteration for the reciprocal, xi+1 := xi * (2 - d * xi), resulting in a reciprocal accurate to about 16 bits, finally multiplies this by the dividend, yielding a 17-bit quotient in [0,2).
For the table lookup, the divisor is first normalized into [0.5, 1) by applying a scale factor 2z, obviously the dividend then needs to be adjusted by the same scale factor. Since the reciprocals of operands in [0.5, 1), are going to be [1,2], the integer bit of the reciprocal is known to be 1, so one can use 8-bit table entries to produce a 1.8 fixed-point reciprocal by adding 0x100
(= 1). The reason 0x101
is used here is not clear; it may be due to a requirement that this step always provides an overestimate of the true reciprocal.
The next two steps are verbatim translations of the Newton-Raphson iteration for the reciprocal taking into account fixed-point scale factors. So 0x2000000
represents 2.0. The code uses 0x2000080
since it incorporates a rounding constant 0x80
(=128) for the following division by 256, used for rescaling the result. The next step likewise adds 0x00000080
as a rounding constant for a rescaling division by 256. Without the scaling, this would be a pure multiplication.
The final multiplication n*d
multiplies the reciprocal in d
with the dividend in n
to yield the quotient in 33 bits. Again, a rounding constant of 0x8000 is applied before dividing by 65536 to rescale into the proper range, giving a quotient in 1.16 fixed-point format.
Continuous rescaling is typical of fixed-point computation where one tries to keep intermediate results as large as possible to maximize the accuracy of the final result. What is a bit unusual is that rounding is applied in all of the intermediate arithmetic, rather than just in the final step. Maybe it was necessary to achieve a specified level of accuracy.
The function is not all that accurate, though, likely caused by inaccuracy in the initial approximation. Across all non-exceptional cases, 2,424,807,756 match a correctly rounded 1.16 fixed-point result, 780,692,403 have an error of 1 ulp, 15,606,093 have a 2-ulp error, and 86,452 have a 3-ulp error. In a quick experiment, the maximum relative error in the initial approximation u
was 3.89e-3. An improved table lookup reduces the maximum relative error in u
to 2.85e-3, reducing but not eliminating 3-ulp errors in the final result.
If you want to look at a specific example, consider h
=0.3 (0x4ccd
) divided by SZ3
=0.2 (0x3333
). Then z
=2, thus d
=0.2*4 = 0.8 (0xcccc
). This leads to u
= 1.25 (0x140
). Since the estimate is quite accurate, we expect (2 - d * u) to be near 1, and in fact, d
= 1.000015 (0x10001
). The refined reciprocal comes out to d
=1.250015 (0x14001
), and quotient is therefore n
=1.500031 (0x18002
).