Why not square the result, and if it's not equal to the input, add or subtract (depending on the sign of the difference) a least significant bit, square, and check whether that would have given a better result?
Better here could mean with less absolute difference. The only case where this could get tricky is when "crossing" √2 with the mantissa, but this could be checked once and for all.
EDIT
I realize that the above answer is insufficient. Simply squaring in 32-bit FP and comparing to the input doesn't give you enough information. Let's say y = your_sqrt(x). You compare y2 to x, find that y2>x, subtract 1 LSB from y obtaining z (y1 in your comments), then compare z2 to x and find that not only z2<x, but, within the available bits, y2-x==x-z2 - how do you choose between y and z? You should either work with all the bits (I guess this is what you were looking for), or at least with more bits (which I guess is what njuffa is suggesting).
From a comment of yours I suspect you are on strictly 32-bit hardware, but let me suppose that you have a 32-bit by 32-bit integer multiplication with 64-bit result available (if not, it can be constructed). If you take the 23 bits of the mantissa of y as an integer, put a 1 in front, and multiply it by itself, you have a number that, except for a possible extra shift by 1, you can directly compare to the mantissa of x treated the same way. This way you have all 48 bits available for the comparison, and can decide without any approximation whether abs(y2-x)≷abs(z2-x).
If you are not sure to be within one LSB from the final result (but you are sure not to be much farther than that), you should repeat the above until y2-x changes sign or hits 0. Watch out for edge cases, though, which should essentially be the cases when the exponent is adjusted because the mantissa crosses a power of 2.
It can also be helpful to remember that positive floating point numbers can be correctly compared as integers, at least on those machines where 1.0F is 0x3f800000.