Floating Point Square Root Reciprocal Method Correct Rounding

Question

I have implemented a 32-bit IEEE-754 Floating Point Square Root using the Newton-Raphson method (in assembly) based upon finding the reciprocal of the square root. I am using the round-to-nearest rounding method. My square root method only accepts normalized values and zeros, but no denormalized values or special values (NaN, Inf, etc.)

I am wondering how I can ACHIEVE correct rounding (with assembly like instructions) so that my results are correct (to IEEE-754) for all inputs? Basically, I know how to test if my results are correct, but I want to adjust the algorithm below so that I obtain correctly rounded results. What instructions should I add to the algorithm?

See: Determining Floating Point Square Root for more information

Thank you!

score 2 · Answer 1 · answered Jul 17 '13 at 20:10

2

There are only about 2 billion floats matching your description. Try them all, compare against sqrtf from your C library, and examine all differences. You can get a higher-precision square root using sqrt or sqrtl from your C library if you are worried. sqrt, sqrtf, and sqrtl are correctly-rounded by typical C libraries, though, so a direct comparison ought to work.

answered Jul 17 '13 at 20:10

tmyklebu

13,915
3
28
57

1

Not sure why this got a downvote, but as an initial test this seems fine. Exhaustive test of 2^31 inputs is quite feasible, and in fact common when testing single-input single-precision math functions. While there is always a chance that a third party library use as a reference has bugs, in practice there is little risk for something like sqrt(). And tmyklebu suggested examining any mismatches, not simply assuming that the reference must be correct. A similar approach would be to test against a relevant SSE intrinsic. – njuffa Jul 17 '13 at 21:07
2

in fact, there are much less than 2 billion, if you consider that only two values for the exponent are interesting (a factor of 4 in the input becomes a factor of 2 in the result, affecting only the result's exponent, which should be no problem). So there are 6 bits less, which makes 33554432 `float`s to check. – Walter Tross Jul 17 '13 at 21:07
@tmyklebu, I am wondering what instructions are necessary to add to the algorithm I posted so that I can obtain correctly rounded results. I know how to compare my results to other algorithms already, what I need is what is missing from my algorithm to ensure correctly rounded results for a reciprocal square root algorithm. I thought that I might get some answers based upon Sterbenz's theorem for obtaining the correctly rounded result. – Veridian Jul 17 '13 at 21:35
To test the mantissa portion only (which is the harder part), one could indeed restrict testing to cover all possible operands in two consecutive binades, e.g. 1.0 <= x < 4.0, which requires 16.7M test vectors. But to make sure everything works correctly an exhaustive test is highly desirable. – njuffa Jul 17 '13 at 21:38
you are right, @njuffa, I should have said "there are 7 bits less", because the two values of the exponent count as 1 bit. – Walter Tross Jul 17 '13 at 21:44
@starbox: So it seems like your question is not "How can I check that my square root implementation returns correctly rounded results?" but rather "How do I achieve correctly rounded results in my square root implementation?" The specifics of the latter depend on how you implemented the square root and what operations you have at your disposal. In general, make sure you get to within one ulp, then round based on the residual x-sqrt(x)*sqrt(x) by conditionally incrementing or decrementing your preliminary result. – njuffa Jul 17 '13 at 21:46
@njuffa, see the post below where I commented on this method. But yes, you are correct in that I want, "How do I achieve". I updated the question to reflect this. – Veridian Jul 17 '13 at 21:56
@starbox: You can do the same deconstructed-`fma` trick as I gave you in your other thread. then check for mismatches. It works almost everywhere for basically the same reason. Then you check it for all `float`s and fix the mismatches. – tmyklebu Jul 18 '13 at 02:15

Walter Tross · Accepted Answer · 2013-07-18T08:45:15.980

1

Why not square the result, and if it's not equal to the input, add or subtract (depending on the sign of the difference) a least significant bit, square, and check whether that would have given a better result?

Better here could mean with less absolute difference. The only case where this could get tricky is when "crossing" √2 with the mantissa, but this could be checked once and for all.

EDIT

I realize that the above answer is insufficient. Simply squaring in 32-bit FP and comparing to the input doesn't give you enough information. Let's say y = your_sqrt(x). You compare y² to x, find that y²>x, subtract 1 LSB from y obtaining z (y1 in your comments), then compare z² to x and find that not only z²<x, but, within the available bits, y²-x==x-z² - how do you choose between y and z? You should either work with all the bits (I guess this is what you were looking for), or at least with more bits (which I guess is what njuffa is suggesting).

From a comment of yours I suspect you are on strictly 32-bit hardware, but let me suppose that you have a 32-bit by 32-bit integer multiplication with 64-bit result available (if not, it can be constructed). If you take the 23 bits of the mantissa of y as an integer, put a 1 in front, and multiply it by itself, you have a number that, except for a possible extra shift by 1, you can directly compare to the mantissa of x treated the same way. This way you have all 48 bits available for the comparison, and can decide without any approximation whether abs(y²-x)≷abs(z²-x).

If you are not sure to be within one LSB from the final result (but you are sure not to be much farther than that), you should repeat the above until y²-x changes sign or hits 0. Watch out for edge cases, though, which should essentially be the cases when the exponent is adjusted because the mantissa crosses a power of 2.

It can also be helpful to remember that positive floating point numbers can be correctly compared as integers, at least on those machines where 1.0F is 0x3f800000.

edited Jul 18 '13 at 08:45

answered Jul 17 '13 at 21:32

Walter Tross

12,237
2
40
64

@starbox my FP is too rusty now to find an example of what you assert, but even so, rounding should affect only the least significant bit, which means that, even in such a case, you can rest assured that you can find no better than y1. Unfortunately I have to go to bed now :-( – Walter Tross Jul 17 '13 at 22:06
I am assuming you are using some variant of Newton-Raphson to cimpute the square root. If you use FMA as a building block, consult Peter Markstein's publications (in particular his book "IA-64 and Elementary Functions") as to how to round sqrt correctly. If you don't have FMA, there is "Tuckerman rounding", which however I haven't used myself. You can use integer operations to compute the residual x-sqrt(x)*sqrt(x) as well, with the round-to-nearest result being the one of two neighboring 32-bit floats that has generated the smaller residual. – njuffa Jul 17 '13 at 22:07
@njuffa, no FMA. I am using a variant of the Newton-Raphson method (see question). I also posted a link to another question with the algorithm I am using (see the accepted answer in the link). – Veridian Jul 17 '13 at 22:12
@starbox If y1 is obtained from y by correcting it in the right direction, it must be better, even if y^2==y1^2 – Walter Tross Jul 17 '13 at 22:15
1

You would want to compute the residual using the full-width product of the tentative result, so as to get the trailing bits. That is why FMA is so useful. In the absence of FMA, I have successfully used integer arithmetic for the residual computation (see __fsqrt_rn() in the file device_functions.h of CUDA). – njuffa Jul 17 '13 at 22:58
1

@starbox: It is not possible that two different same-sign IEEE-754 binary floating-point values can produce the same square, in the absence of overflow or underflow. Consider a positive s and s+u, where u is the ULP of s. Then (s+u)**2 is s**2 + 2*s*u + u*u. The ULP in this region is at most 2*s times u, so (s+u)**2 exceeds s by more than an ULP. Therefore, it cannot round to the same floating-point value as s**2. – Eric Postpischil Jul 17 '13 at 23:05
@WalterTross, What if your answer is incorrect by more than 1 ULP? Or the subsequent 2 squares (y1 and y) and the subsequent 2 subtractions is very expensive in terms of clock cycles and instruction count? – Veridian Jul 17 '13 at 23:37
Commonly used approaches to generate round-to-nearest results require that the preliminary result is within one ulp of the final result. It may additionally be necessary for the preliminary result to always fall on the same side of the true result, i.e. either always high or always low. – njuffa Jul 18 '13 at 00:25
Comparing the squares will give you wrong results in a handful of cases. (Unfortunately, I haven't got time to produce a few of those right now.) – tmyklebu Jul 18 '13 at 02:17
@starbox the answer to the first part of your question should be in my answer (and it's essentially looping, which you could unroll if you know the maximum number of iterations). The answer to the second part... I really don't know, but if I were you I would first explore the situation with the help of tmyklebu's answer – Walter Tross Jul 18 '13 at 12:15
tmyklebu's answer involved taking the formula (s*s-foo)/s and doing one last newton iteration. Mine involves the formula Xi*(3/2-(1/2*Xi^2)). So I'm not 100% sure how to apply his formula to this reciprocal method. – Veridian Jul 18 '13 at 16:21
@starbox I was referring to the comparison of all output values to the output of `sqrtf`. You could count: hits, off by +1, off by +2, off by -1, off by -2 (least significant bits), etc. Except for the "corner cases", this is approximated by the difference of the values seen as ints. – Walter Tross Jul 18 '13 at 16:51
@WalterTross, I just read your question edit (from 9 hours ago). I only have a 32-bit FP unit, no 32-bit hardware multiplier. Multiplies are 16-bit*16-bit and result stored in a 32-bit accumulator. – Veridian Jul 18 '13 at 17:55
@starbox, now I see why you would like to keep integer operations at a minimum. I don't know whether I'll have the time to do any further analysis. In case, could you tell me how many iterations you currently do and with what starting value? – Walter Tross Jul 19 '13 at 09:11
@WalterTross, i follow that algorithm posted on the link in my question. it has all the details. – Veridian Jul 19 '13 at 15:10
@starbox, so the starting value is `1.1 - 1/6 * r`, where `r` is the input number "reduced" to the range [1, 4), and you do 4 iterations, like in the example? – Walter Tross Jul 19 '13 at 15:21
1

let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/33781/discussion-between-starbox-and-walter-tross) – Veridian Jul 19 '13 at 16:09

Floating Point Square Root Reciprocal Method Correct Rounding

2 Answers2